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Preface 



CICLing 2004 was the 5th Annual Conference on Intelligent Text Processing 
and Computational Linguistics; see www.CICLing.org. CICLing conferences are 
intended to provide a balanced view of the cutting-edge developments in both 
theoretical foundations of computational linguistics and the practice of natural 
language text processing with its numerous applications. A feature of CICLing 
conferences is their wide scope that covers nearly all areas of computational 
linguistics and all aspects of natural language processing applications. These 
conferences are a forum for dialogue between the specialists working in the two 
areas. 

This year we were honored by the presence of our invited speakers Martin 
Kay of Stanford University, Philip Resnik of the University of Maryland, Ricardo 
Baeza-Yates of the University of Chile, and Nick Campbell of the ATR Spoken 
Language Translation Research Laboratories. They delivered excellent extended 
lectures and organized vivid discussions. 

Of 129 submissions received (74 full papers and 44 short papers), after careful 
international reviewing 74 papers were selected for presentation (40 full papers 
and 35 short papers), written by 176 authors from 21 countries: Korea (37), Spain 
(34), Japan (22), Mexico (15), China (11), Germany (10), Ireland (10), UK (10), 
Singapore (6), Canada (3), Czech Rep. (3), France (3), Brazil (2), Sweden (2), 
Taiwan (2), Turkey (2), USA (2), Chile (1), Romania (1), Thailand (1), and The 
Netherlands (1); the figures in parentheses stand for the number of authors from 
the corresponding country. 

In addition to a high scientific level, one of the success factors of CICLing 
conferences is their excellent cultural programs. CICLing 2004 was held in Korea, 
the beautiful and wonderful Country of the Morning Calm, as Korean people 
call their land. The participants enjoyed three full-day excursions to the most 
important natural and historical attractions around Seoul city; see photos at 
www.CICLing.org. Full-day excursions allowed for friendly personal interaction 
between participants and gave them a chance to make friends with the most 
famous experts in the field, people who are not easily accessible at larger confe- 
rences. 

A conference is the result of the work of many people. First of all I would 
like to thank the members of the Program Committee for the time and effort 
they devoted to the reviewing of the submitted articles and to the selection 
process. Especially helpful were Manuel Vilares, John Tait, Alma Kharrat, Karin 
Verspoor, Viktor Pekar, and many others - a complete list would be too long. 
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Preface 



Obviously I thank the authors for their patience in the preparation of the 
papers, not to mention the very development of their scientific results that form 
this book. I also express my most cordial thanks to the members of the local 
Organizing Committee for their considerable contribution to making this confe- 
rence a reality. Last but not least, I thank our host - the ITRI of the Chung- Ang 
University. I would like to also thank RITOS-2 of CYTED for their support of 
the CICLing conferences. 

December 2003 Alexander Gelbuklr 
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Towards an LFG Syntax-Semantics Interface 
for Frame Semantics Annotation 



Anette Frank 1 and Katrin Erk 2 
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Anette . FrankOdf ki . de, 
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66123 Saarbriicken, Germany 
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Abstract. We present an LFG syntax-semantics interface for the semi- 
automatic annotation of frame semantic roles for German in the SALSA 
project. The architecture is intended to support a bootstrapping cycle 
for the acquisition of stochastic models for frame semantic role assign- 
ment, starting from manual annotations on the basis of the syntacti- 
cally annotated TIGER treebank, with smooth transition to automatic 
syntactic analysis and (semi-)automatic semantic annotation of a much 
larger corpus, on top of a free-running LFG grammar of German. Our 
study investigates the applicability of the LFG formalism for modeling 
frame semantic role annotation, and designs a flexible and extensible 
syntax-semantics architecture that supports the induction of stochastic 
models for automatic frame assignment. We propose a method famil- 
iar from example-based Machine Translation to translate between the 
TIGER and LFG annotation formats, thus enabling the transition from 
treebank annotation to large-scale corpus processing. 



1 Introduction 

This paper is a first study of an LFG syntax-semantics interface for frame se- 
mantic role assignment. The architecture is intended to support semi-automatic 
semantic annotation for German in SALSA the Saarbriicken Semantics An- 
notation and Analysis project 1 - which is based on Frame Semantics and is 
conducted in cooperation with the Berkeley FrameNet project [1,15]. 

The aim of SALSA is to create a large lexical semantics resource for German 
based on Frame Semantics, and to develop methods for automated assignment 
of corpora with frame semantic representations. 

In the first (and current) phase of the SALSA project, semantic annotation is 
fully manual, and takes as its base the syntactically annotated TIGER treebank 

1 See [8] and the SALSA project homepage http://www.coli.uni-sb.de/lexicon 

A. Gelbukh (Ed.): CICLing 2004, LNCS 2945, pp. 1-13, 2004. 
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[2]. 2 Due to the inherently sparser data seeds for semantic frames (as opposed to 
syntactic structures), it will be of utmost importance for the acquisition of high- 
performing stochastic models to process and collect data from larger corpora. 
In the second project phase we will thus proceed to semi-automatic semantic 
annotation of a much bigger, unparsed corpus. Here, a reliable and informative 
syntactic parse is essential: first, as a basis for semantic annotation, and second, 
since part of the information to be acquired is in itself syntactic. 

Similar to the approach taken for syntactic annotation of the NEGRA cor- 
pus in [3,4], SALSA aims at a bootstrapping approach for semantic annotation. 
Stochastic models for frame annotation are learned from a seed set of manual 
annotations, thus speeding up the manual annotation process and yielding more 
data for learning. Ultimately, we will learn increasingly refined models for frame 
assignment, by automatic annotation and re-training on larger corpora. 

In the remainder of this paper, we discuss diverse architectures to imple- 
ment a bootstrapping cycle for frame annotation that bridges the gap between 
treebank-based and large-scale free text processing. We investigate the applica- 
bility of the LFG formalism for the frame annotation task, and design an LFG 
syntax-semantics interface for frame assignment. We discuss alternative mod- 
els for the interface, in terms of co-description and description by analysis and 
discuss their implications in terms of disambiguation effects and the integration 
of additional knowledge sources. Finally, we present a method for learning the 
required mappings between LFG and TIGER-SALSA representations. 



2 Annotating TIGER with Frame Semantic Roles 

The TIGER Corpus [2] is a large syntactically annotated corpus of German. 
The annotation scheme is surface oriented and comparably theory-neutral. The 
dependency-oriented constituent structures provide information about grammat- 
ical functions (on edge labels) and syntactic categories (on constituent node la- 
bels). An example is given in the shaded tree of Fig. 1. 

The FrameNet Project [1,15] is based on Fillmore’s Frame Semantics. A 
frame is a conceptual structure describing a situation. It is introduced by a target 
or frame-evoking element (FEE). Roles, called frame elements (FEs), are local 
to particular frames and identify the participants and props of the described 
situations. The aim of FrameNet is to provide a comprehensive frame-semantic 
description of the core lexicon of English. The current on-line version of the frame 
database consists of about 400 frames, covering about 6,900 lexical entries. 

The SALSA Project [8] annotates frames on top of the TIGER treebank. 
Frames are represented as flat, independent trees, as shown in the white-labeled 
trees with curved edges in Fig. 1. The root is labeled with the frame name. Edges 
are labeled by frame elements or by ’FEE’ and point to syntactic constituents. 

2 With 80.000 sentences, TIGER is comparable, in size, to the Penn Treebank. From 
our current gold corpus we estimate an average of about 3 frames per sentence, thus 
approx. 240.000 frame annotations for the entire TIGER corpus. This number is 
comparable to the English FrameNet resource used in [13,10]. 
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SPD requests that coalition talk about reform 



Fig. 1. TIGER-SALSA graphical annotation 

Fig. 1 contains two FEEs: fordert. . . auf (auffordem) and Gesprach. auffordem 
evokes the frame REQUEST. As the FEE does not form a single syntactic con- 
stituent, the label FEE is assigned to two edges. The SPEAKER is the subject 
(SB) NP SPD , the ADDRESSEE is the direct object (OA) NP Koalition, and the 
MESSAGE is the modifier (MO) PP zu Gesprach iiber Reform. The second FEE, 
the noun Gesprach , introduces the frame CONVERSATION, in which two groups 
talk to one another. The only NP-internal frame element is the TOPIC (“what the 
message is about”) iiber Reform , whereas the INTERLOCUTOR-1 (“the prominent 
participant in the conversation”) is realized by the direct object of auffordem. 

Both the syntactic annotation of the TIGER corpus and the frames and 
frame elements that SALSA is adding are encoded in a modular XML format. 

3 A Bootstrapping Architecture for Frame Annotation 

The bootstrapping cycle for automatic frame and frame element assignment that 
we envision is similar to the process applied for NEGRA in [4]: First, stochastic 
models for frame and frame element annotation are learned from a seed set of 
manual annotations of the TIGER corpus. These models are applied to support 
interactive semi-automatic annotation of new portions of TIGER, with human 
annotators accepting or correcting assignments proposed by the system. New 
stochastic models derived from this larger set of TIGER data are applied for 
(semi-) automatic annotation of a larger, automatically parsed corpus, which 
again yields more training data, and continuously refined stochastic models. 



3.1 From Treebank Annotation to Free Text Processing 

To implement this bootstrapping cycle, we need a syntactic analyzer for free 
German text processing that (i) provides fine-grained syntactic information that 
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is compatible with TIGER syntactic annotations and allows us to map the anal- 
yses of the syntactic analyzer to and from the TIGER-SALSA syntactic and 
semantic annotation format, and that (ii) delivers a high percentage of correctly 
analyzed data. Finally, (iii) we aim at a probabilistic parsing architecture that 
allows us to study the potential of semantics-driven syntactic disambiguation. 

The most straightforward scenario is to employ a parser that delivers the 
same type of representations as used in the TIGER treebank. Yet, while first 
attempts to derive probabilistic grammars for German from the TIGER (or 
NEGRA) treebank [7,6] are encouraging, they are still in need of improvement. 3 

Another possibility is to employ a broad-coverage parser for German that 
provides comparable fine-grainedness of analysis as exploited in the TIGER- 
SALSA annotations, and to provide a conversion routine for its output to match 
the TIGER format, or conversely - to port the manually created TIGER- 
SALSA annotations to the output representation of such a parser. 

In the first case, with TIGER syntax as main format, stochastic models 
would be derived from a combination of TIGER syntax and frame annotation. 
Transfer from the parser’s output to the TIGER format would be needed in 
all phases of the cycle. In particular, the parser output for any corpus would 
have to be transformed to TIGER syntax. In the second case, with the parser’s 
output as main format, stochastic models would be derived from a combination 
of the parser’s format and frame annotation, which means that a semantic frame 
projection for the parser output is needed. Transfer between TIGER-SALSA and 
parser output representation would be needed only in the first phases of the cycle, 
while processing data of the TIGER corpus. 4 Moreover, this scenario lends itself 
to an integrated semantic disambiguation model in the sense of (iii). 

3.2 German LFG for Corpus Processing and Frame Annotation 

We propose to use a German LFG grammar to support the bootstrapping cycle 
for frame annotation. The TIGER annotation process was supported by semi- 
automatic processing with a German LFG grammar [2, 20]. 5 In addition to the 
LFG-to-TIGER transfer module developed there, [11] has recently built a map- 
ping from TIGER to LFG f-structures. These automatic conversions ensure that 
LFG representations are rich enough to match the syntactic TIGER represen- 
tations. [2] report a coverage of 50% for the LFG grammar, with 70% precision. 
Newer figures are not yet available, but we expect the German grammar to soon 
reach the performance of the English LFG grammar described in [18]. 

We further opt for the second scenario of the previous paragraph: using LFG 
f-structures as the primary basis for building stochastic models. This scenario 

3 [7] do not assign functional labels, whereas [6] produce LFG (proto) f-structures. 
Though not fully comparable, [6] could be used for our purposes in similar ways, 
and possibly in tandem with the manually developed LFG grammar described below. 

4 Manual annotation is aided by an annotation tool based on the TIGER-SALSA 
format [9]. Also, the TIGER-SALSA corpus is intended as a theory- neutral reference 
corpus, and must include sentences that are out-of-coverage for the chosen parser. 

5 The grammar is being developed at the IMS, University of Stuttgart. 
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requires the design of an LFG semantics projection for frame assignment, and 
a mapping between TIGER-SALSA and LFG syntax-semantics representations 
to implement the bootstrapping cycle. However, it restricts transformations be- 
tween syntactic formats to the learning phase, and lends itself to an exploration 
of semantic features for syntactic disambiguation. A further advantage of this 
model is that it allows for the extension of existing probabilistic methods for 
syntactic disambiguation in [18] to online semantic classification and disam- 
biguation. The stochastic tools employed in [18,17] provided with the LFG 
processing platform XLE - support training and online application of loglinear 
models. We can thus explore the disambiguation effects of semantic annotation 
in combination with, or independent from syntactic disambiguation. 



4 LFG for Frame Annotation: Chances and Challenges 

In the following sections we investigate the applicability of LFG for the frame 
annotation task, and design a syntax-semantics interface for Frame Semantics. 

Lexical Functional Grammar [5] assumes multiple levels of representation 
for linguistic description. Most prominent are the syntactic representations of 
c(onstituent)- and f(unctional)-structure. The correspondence between c- and 
f-structure is defined by functional annotations of CFG rules and lexical entries. 
This architecture can be extended to semantic (and other) projection levels [14]. 

The f-structure representation abstracts away from surface-syntactic proper- 
ties, and allows for uniform reference to syntactic dependents in diverse syntactic 
configurations. This is important for the task of frame annotation, as it abstracts 
away from aspects of syntax that are irrelevant to frame (element) assignment. 
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Fig. 2. LFG projection architecture with c- and and f-structure representation 



The LFG syntactic analysis of word order, control and raising constructions, 
long-distance dependencies and coordination provides f-structure representations 
where non-local or implicit arguments are localized and thus allow for uniform 
association of local grammatical functions with frame semantic roles. 

In (1), the SELLER role can be uniformly associated with the local SUBJect 
of sell , even though it is realized as a relative pronoun of come that controls the 
SUBJect of sell, (b.) an implicit second person SUBJ, (c.) a non-overt SUBJ con- 
trolled by the OBLique object of hard , and (d.) a SUBJ ( we ) in VP coordination. 
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(1) a. .. the woman who had come in to sell flowers to the customers overheard 

their conversation .. (from [15]) 

b. Don’t sell the factory to another company. 

c. It would be hard for him to sell newmont shares quickly, (from [15]) 

d. .. we decided to sink some of our capital, buy a car, and sell it again 
before leaving, (from [15]) 

More challenging are phenomena as in (2.a,b), where the SUBJ of sell is 
not syntactically represented as identical to (a.) the passive SUBJ of the matrix 
clause, or (b.) the matrix SUBJ of an embedded adjunct clause containing sell. 
Here the SELLER semantic role has to be assigned nonlocally (unless coreference 
information is made available). 

(2) a. . . the old adage about most people simply refusing to move rather than 

sell their house .. 

b. .. we'd do the maintenance and watering instead of just selling the 
plants .. (both from [15]) 

There are cases where a frame-evoking element and one of its FEs are both 
parts of a single compound, e.g. in (3) the noun modifier Auto fills the GOODS 
role in the COMMERCE frame evoked by the head noun Verkdufer. The LFG f- 
structure analysis of nominal compounds provides a (flat) decomposition into a 
nominal head and a set NMOD of noun modifiers. The NMOD modifier Auto can 
thus be represented to fill the GOODS role in the frame evoked by the head noun. 

(3) Autoverkaufer geben zur Zeit bis zu 10% Rabatt. 

Car dealers offer nowadays up to 10% reduction. 

Formal Devices. The LFG formalism provides powerful descriptional de- 
vices that are essential for the design of a flexible syntax-semantics interface. 

The regular expression-based specification of uncertain embedding paths 
within f-structures both outside-in and inside-out [16] makes it possible 
to refer to any piece of f-structure from anywhere within the f-structure. 

The restriction operator [19] permits reference to partial f-structures. It can 
be used to link semantic roles to partial f-structures, such as grammatical func- 
tions to the exclusion of embedded material (e.g. sentential adjuncts). 

Examples that use these devices will be discussed in Section 5.1. 

5 LFG Syntax- Semantics Interface for Frame Semantics 

5.1 A Frame Semantics Projection 

As a direct transposition of the SALSA annotation format we can define a Frame 
Semantics projection a f from the level of f-structure (compare Figs. 1 and 3). 

While in the traditional LFG projection architecture (as in [14]) f-structure 
predicates are related to predicate-argument structures in s-structure, we de- 
fine the o /-proj ection to introduce elementary frame structures, with attributes 
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FRAME, FEE (frame-evoking element), and frame-specific role attributes. Fig. 3 
displays the 07— projection for the sentence in Fig. I. 6 

Fig. 4 states the lexical constraints that define this mapping. 07 is defined as 
a function of f-structure. Thus, the verb auffordern introduces a node 07(f) hr 
the frame semantics projection of f, its local f-structure, and defines its attributes 
FRAME and FEE. The frame elements are defined as 07-projections of the verb’s 
SUBJ, OBJ, and OBL OBJ functions. For example, the SPKR role, referred to as 
(07(f) SPKR), the SPKR attribute in the frame projection 07 ( f) of f, is defined 
as identical to the 07-projection of the verb’s SUBJ function, o"/(f SUBj). 7 

The noun Gesprach , which evokes the CONVERSATION frame, illustrates the 
use of inside-out functional equations to refer to material outside the local f- 
structure of a frame evoking predicate. The iNTERLOCUTORl (intlc_1) role 
corresponds to the OBJ of auffordern. This function is accessible from the noun’s 
f-structure via the inside-out equation ((obl OBJ f) OBj): starting from f (the 
f-structure of Gesprach ), the path leads inside-out to the f-structure (obl OBJ 
f) of the verb, from which it descends to the verb’s OBJ: ((obl OBJ f) OBj). 



pRED ‘AUFPORDERiy((sUBj)(OBj)(OBL)) 
SUBJ PRED ‘SPD’] 

OBJ PRED ‘KOALITION’f 
PRED l Zu((OBj))’ 
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Fig. 3 . LFG projection architecture for Frame Annotation 



fordert V, Gesprach N, 

(fPRED) = ‘AUFFORDERN((fSBj)(fOBj)(fOBL))’ (fPRED)= ‘GESPRACH’ 



(07(f) FRAME) = REQUEST 
(07(f) FEE ) = (f PRED FN) 
(07(f) SPKR) = 07 (f SUBj) 
( 07 (f) ADD ) = 0 /(f OBJ) 
(07(f) MSG) = 07 (f OBL OBJ) 



(07(f) FRAME) = CONVERSATION 
(07(f) EEE )= (f PRED FN) 

(07(f) INTLCl)=0-/((OBL OBjf) OBj) 
(07(f) TOPIC) = 07 (f ADJ OBJ) 



Fig. 4 . Frame projection by co-description 



Frames in Context. The projection of frames in context can yield partially 
connected frame structures. In Fig. 3, Gesprach. maps to the MSG of REQUEST and 
also introduces a frame of its own, CONVERSATION. Due to the syntactic relation 
(/1 OBL OBj)=/ 2 , (with /1 and fi the f-structures of auffordern and Gesprach, 
respectively), the equations (0/(/i) MSG), 07 (/1 OBL obj) and (07(72)) all refer 



6 In this paper we omit details involving set-based representations for ADJuncts. 

7 The MSG is coindexed with the lower frame, a projection of the noun Gesprach. 
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to a single node in the cry-projection. The CONVERSATION frame is thus defined 
as an instantiation, in context, of the MSG role of a REQUEST frame. 



(4) a. Haft fiir Blutpanschen gefordert 

[NP_SB Haft] [PP_MO fiir Blutpanschen] [VV_HD gefordert] 
‘Prison sentence demanded for unsanitary blood collection 1 
b. fordert: (cr/(t) MSG) = cr/(t SUBj) 

(cry(t) MSG) = cr/(t ADJ OBJ) 
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Special Configurations. Potentially problematic are configurations where 
multiple syntactic constituents are mapped to a single semantic role, as they 
may lead to an inconsistency in the cry-projection. 8 

An example is shown in (4). The SUBJect Haft and the modifier PP fur 
Blutpanschen have jointly been annotated as the MSG role in the REQUEST frame 
of fordern. The projection of the MSG role from two constituents can be modeled 
by the equations in (4.b). Yet this simple model will lead to an inconsistency if the 
involved predicates introduce individual frames, at the same level of embedding. 

In (4), the SUBJ Haft evokes a frame punishment, in which the modifier fiir 
Blutpanschen fills the REASON role, as defined in (4.c). Due to this embedding 
asymmetry of SUBJ and modifier at the semantic level, the joint equations in 
(4.b,c) do not lead to inconsistency, but a circular semantic structure: By (4.b), 
adj OBJ and SUBJ are mapped to the same cry value both in (4.b) and (4.c), so 
in (4.c) the REASON of PUNISHMENT and the PUNISHMENT frame itself have to 
be equal - which is not a correct representation of the meaning of the sentence. 

We found that in the SALSA annotations asymmetric embedding at the se- 
mantic level is the typical pattern for constituents that jointly constitute a single 
frame element. We therefore propose to make use of functional uncertainty 
equations to accommodate for embedded frames within either one of the other- 
wise re-entrant constituents. In (4.b), we thus relax the equation mapping the 
PP to MSG to (cry (t) MSG ROLE*)=cry(t ADJ obj), with role instantiating to 



In the existing annotations, 909 (or 1.2%) of the frame elements match this pattern. 
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(5) der von der SPD geforderte Einstieg in eine Okosteuerreform 
the by the SPD demanded start of an ecological tax reform 
’the start of an ecological tax reform, demanded by the SPD’ 
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REASON in (4). In this way, the functional uncertainty over possible semantic 
roles accommodates for (possibly unassigned) asymmetrically embedded frames. 

Another typical configuration where discontinuous constituents correspond 
to a single semantic role is illustrated in (5): der and Einstieg in eine Oko- 
steuerreform correspond to the MSG of a request, which is introduced by the 
adjectival head geforderte within the modifier of the phrase. Its by-phrase adjunct 
fills the SPKR role. This case differs from the one above in that the discontinuous 
constituents jointly form a headed phrase (with a local PRED in f-structure). 

This configuration is similar to the well-known head-switching phenomena, 
and can be represented by use of the restriction operator [19]. The equation 
(t\{MOD}) refers to the partial f-structure (displayed as a copy in (5)) consisting 
of t without the function mod. This unit can be defined to fill the MSG role of 
request. Since the frame evoking head is embedded within MOD itself, this 
involves an inside-out functional equation: (cr/(t) MSG)=ct/((mod t)\{MOD}). 



5.2 Co-description vs. Description by Analysis 

Co-description. In the projection architecture we just presented, f- and s- 
structure equations jointly determine the valid analyses of a sentence. This 
method of defining and evaluating projection levels is called co-description. 

With co-description, syntactic and semantic analysis interact, leading to 
semantics-driven syntactic disambiguation. Our example sentence in Fig. 1 is 
syntactically four-ways ambiguous. SPD and Koalition, being unmarked for case, 
can both be SUBJ or OBJ; the PP iiber Reform can be attached to Gesprach (as 
displayed), or be an adjunct of auffordern. However, the semantic constraints for 
Gesprach in Fig. 4 define its role TOPIC as a PP adjunct (adj OBj) of the local 
head. This eliminates the readings where Reform is adjoined to the verb. 

Description by Analysis (DBA). An alternative to the co-descriptive 
model is semantics construction via description by analysis [14]. Here, seman- 
tics is built on top of fully resolved (disjunctive) f-structure analyses. Analyses 
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that are consistent with syntax-semantics mapping constraints are semantically 
enriched - while remaining solutions are left untouched. 

Technically, this architecture can be realized by use of a term rewriting system 
as employed in transfer . 9 In a transfer approach, feature structures are described 
by sets of predicates. Non-prefixed predicates are constraints on the applicability 
of a rule, to be used e.g. for describing the shape of the f-structure: 

pred(A,auffordern), subj(A,B), obj(A,C), obl(A,D), obj(D,E) 

Here, features are encoded by predicates that take as arguments atomic values 
or variables for feature structure nodes. Predicates prefixed with + introduce 
new nodes and values: Encoding the cr/-projection by a predicate semy, we can 
enrich the matched f-structure with the frame information for auffordern and 
link the SPKR role to the 07 -projection of the SUBJ SPD: 

+sem/ ( A, Sem A) , -l-frame ( Sem A , request ) , +fee ( Sem A , auffordern) , 

+sem / (B , SemB) , +spkr ( Sem A , SemB) 

Implications. Both models are equally powerful in terms of expressiveness . 10 
While co-description integrates the frame semantics projection into the gram- 
mar and parsing process, DBA keeps it as a separate module. This means that 
DBA is more suited for the development phase of LFG-based frame assignment, 
while co-description, which is particularly interesting for studying joint syntactic 
and semantic disambiguation, may be used in later stages. With DBA, seman- 
tics does not interfere with grammar design and can be developed separately. 
Subsequently the transfer rule sets can be automatically converted to equivalent 
co-description constraints. Due to its greater modularity, the DBA approach also 
facilitates extensions of the projection architecture to include external semantic 
knowledge sources, such as word sense, named entity typing, and coreference. 



6 Learning Translations between Representations 

In the previous section, we investigated representational aspects of an LFG 
syntax-semantics interface for frame assignment. To implement the full boot- 
strapping cycle for (semi-) automatic frame assignment (cf. Sec. 3), we finally 
need a mapping to translate between TIGER-SALSA representations and LFG 
representations with frame semantics projection. With such a mapping, we can 
(i) port TIGER-SALSA annotations to the LFG format, to build a seed corpus 
for stochastic modeling, and (ii) extract transfer-based frame assignment rules 
from the seed annotations, to disjunctively apply them to new sentences. In 
the reverse direction, we can (iii) convert automatically assigned frames to the 
TIGER-SALSA format, to be corrected or confirmed by human annotators. 

Transfer-based conversions between the LFG and TIGER formats have been 
built in [20,11]. But the transfer rules need to be updated with every change 

9 The XLE includes a transfer component that operates on packed f-structures [12] . 
10 Except for functional uncertainty, which in transfer can only be of bounded length. 
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of the grammar. Instead, we propose to learn translations between LFG and 
TIGER formats using a method inspired by Example-based MT. We use the 
aligned LFG-TIGER treebank of [11] as a “parallel corpus”. Starting out with 
pairs of TIGER and LFG structures, we want to obtain parallel path descriptions 
that - within the respective syntactic structures - identify the relevant frame 
(evoking) elements. Since we are operating on identical sentences, we can use 
the surface strings to establish the corresponding path descriptions. 11 

For example, the paths that identify the SPKR in our running example can 
be described by the correlated TIGER and LFG path expressions (cf. Figs. 1,3): 

TIGER path string LFG f-struct. path LFG c-struct. path 

spkr [S,SB,NE] SPD [subj] [S,NP] 

Paths are given from the root down. For TIGER, we use paths with alternat- 
ing categorial (node) and functional (edge) labels. In the LFG path descriptions 
functional and categorial descriptions are separated. To avoid spurious ambigu- 
ities in case of non-branching structures, we choose the shortest path (highest 
constituent) that yields the exact target string. For frame (evoking) elements 
that correspond to multiple or discontinuous elements (such as fordert auf) we 
generate a list of paths for the individual constituents: 

TIGER path string LFG f-struct. path LFG c-struct. path 

request [S,HD,VVFIN] fordert. [pred] [S,VP,V] 

[S,SVP,PTKVZ] auf [S,SVP,PTKVZ] 

With these correspondences we can port frame annotations from the TIGER- 
SALSA corpus to the parallel LFG corpus, and freely translate between these 
formats. They can further be used to extract generalized transfer frame annota- 
tion rules, for application to new LFG-parsed sentences. 

This method depends on a sufficiently rich set of seed annotations as training 
data, and for refinement of the rule extraction algorithm. This is ensured by the 
first bootstrapping cycles, with annotations being checked by human annotators. 

7 Conclusions and Perspectives 

This study investigates a general architecture for (semi-) automatic frame assign- 
ment that supports the transition from treebank-based annotation to large-scale 
corpus processing in a bootstrapping architecture, using LFG as the underlying 
syntactic formalism. Besides linguistic considerations, this choice is motivated 
by the availability of a large-scale German LFG grammar and a powerful pro- 
cessing platform that includes a translation component and tools for stochastic 
modeling. This combination will allow us to study the (combined and individual) 
effects of syntactic and semantic disambiguation. 

We designed an LFG syntax-semantics interface for frame semantics and 
showed how to address potentially problematic configurations. To our knowledge, 
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Hence the relation to EBMT, where translation rules are learned from examples. 
Here, we learn correspondences between syntactic structures for ‘identical’ languages. 
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this is the first study to investigate frame semantics as a target representation 
for semantics construction from syntax. We discussed two architectures for this 
syntax-semantics interface: the co-descriptive model, where semantic construc- 
tion is integrated into the grammar, and description by analysis, which works as 
a separate module and is more robust. Rules for frame semantics projection can 
be derived from the annotated TIGER-SALSA corpus, given a mapping between 
the TIGER and LFG syntax formats. We propose to learn this mapping from 
the ’aligned’ TIGER and LFG annotations of the TIGER corpus, to alleviate 
the maintenance problem of hand-coded transfer rules for corpus conversion. 
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Abstract. Korean has a complex inflectional system, showing aggluti- 
native morphology and using affixation as the major mechanism for word 
formation. A prerequisite to the successful development of any syntac- 
tic/semantic parsers for the language thus hinges on the efficient lexicon 
that can syntactically expand its lexical entries and map into syntax 
and semantics with robust parsing performance. This paper reports the 
system of the Korean Resource Grammar developed as an extension of 
HPSG (Head-driven Phrase Structure Grammar) and the results of im- 
plementing it into the Linguistic Knowledge Building (LKB) system (cf. 
Copestake 2002). The paper shows that the present grammar proves to 
be theoretically as well as computationally efficient enough in parsing 
Korean sentences. 



1 Korean Resource Grammar 

The Korean Resource Grammar (KRG) is a computational grammar for Korean 
currently under development since October 2002 (cf. Kim and Yang 2003). Its 
aim is to develop an open source grammar of Korean. The grammatical frame- 
work for the KRG is the constraint-based grammar, HPSG (cf. Sag, Wasow, and 
Bender 2003). HPSG (Head-driven Phrase Structure Grammar) is built upon 
a non-derivational, constraint-based, and surface-oriented grammatical architec- 
ture. HPSG seeks to model human languages as systems of constraints on typed 
feature structures. In particular, the grammar adopts the mechanism of type 
hierarchy in which every linguistic sign is typed with appropriate constraints 
and hierarchically organized. The characteristic of such typed feature structure 
formalisms facilitates the extension of grammar in a systematic and efficient 
way, resulting in linguistically precise and theoretically motivated descriptions 
of languages including Korean. The concept of hierarchical classification is es- 
sentially assigning linguistic entities such as phrases and words to specific types, 
and an assignment of those types to superordinate types. Each type is declared 
to obey certain constraints corresponding to properties shared by all members of 
that type. This system then allows us to express cross-classifying generalizations 
about phrases and words, while accommodating the idiosyncracies of individual 
types on particular subtypes of phrases or words. 

A. Gelbukh (Ed.): CICLing 2004, LNCS 2945, pp. 14—25, 2004. 

(c) Springer- Verlag Berlin Heidelberg 2004 
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As the basic tool for writing, testing, and processing the Korean Resource 
Grammar, we adopt the LKB (Linguistic Knowledge Building) system (Copes- 
take 2002). The LKB system is a grammar and lexicon development environment 
for use with constraint-based linguistic formalisms such as HPSG. 1 

The Korean Resource Grammar consists of grammar rules, inflection rules, 
lexical rules, type definitions, and lexicon. All the linguistic information is rep- 
resented in terms of signs. These signs are classified into subtypes as represented 
in a simple hierarchy in (1): 



(1) sign 

syn-st 

verbal nominal adverbial adnominal word phrase 

hd-arg-ph hd-mod-ph hd-word-ph 



The elements in lex-st (lexical- structure) type, forming the basic components of 
the lexicon, are built up from lexical processes such as lexical rules. Parts of 
these elements will be realized as word to function as a syntactic element, as an 
element of syn-st (syntactic-structure) . Phrases projected from word form basic 
Korean well-formed phrases such as hd-arg-ph (head-argument-ph) and hd-mod- 
ph (head-modifier-ph). In what follows, we will discuss how such projections are 
possible within a type-featured system, KRG. 



2 Building the Lexicon through a Templatic Approach 

The verb in Korean cannot be an independent word without inflectional suffixes. 
The suffixes cannot be attached arbitrarily to a stem or word, but need to observe 
a regular fixed order. Reflecting this, the verbal morphology has traditionally 
been assumed to be templatic. The template in (1) is a simplified one for the 
verbal suffixes in Korean, assumed in Clro and Sells (1994), among others. 2 

(2) V-base + (Passive/Caussative) + (Hon) + (Tense) + Mood + (Comp) 

As can be seen from the above template, verb suffixes, attaching to the preceding 
verb stem or word, mark honorific, tense, and mood functions. Morphologically, 
the inflectional suffixes preceding Mood are optional, but a Mood suffix obli- 
gatorily needs to be attached to a verb stem in simple independent sentences. 
Thus the verbal stem and the mood suffix are mutually bound in the sense that 
the bare verb stem cannot be used uninflected in any syntactic context and it 
should be inflected at least with the mood suffix, as seen in (3). 

1 The LKB is freely available with open source (http://ling.stanford.edu). 

2 Abbreviations adopted in this paper are follows: Acc (Accusative), Comp (Com- 
plementizer), Conj (Conjunction), Decl (Declarative), Del (Delimiter), Gen (Geni- 
tive), Hon (Honorific), Imper (Imperative), Loc (Locative), Nom (Nominative), Nmlz 
(Nomilizer), PI (Plural), Postp (Postposition), Prop (Propostive), tns (tense), Sug 
(Suggestive). 
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(3) a. ilk-(ess)-ta ‘read-(Past)-Decl’ 
b. *ilk-ess ‘read-Past’ 

Also, as expected from the template, the verbal suffixes observe the rigid 
ordering restrictions: the template ordering cannot be violated. 

(4) a. *cap-ass-si-ta ‘catch-Past-Hon-Decl’ 
b. *cap-ta-ass ‘catch-Decl-Past’ 

The template given in (2) appears to capture the ordering generalizations as 
well as combinatory possibilities of verbal suffixes. However, the template alone 
could generate some ill- formed combinations, as given in (5). 

(5) a. ka-(*si)-(*ess)-ca ‘go-Hon-Past-Prop (Let’s go!)’ 
b. ka-(*si)-(*ess)-la ‘go-Hon-Past-Imper (Go!)’ 

If we simply assume the template in (2) with the given suffixes in each slot, we 
would allow the ill- formed combinations here. The propositive mood suffix -ca 
and imperative mood suffix -la cannot combine either with the honorific suffix 
or with the tense suffix. They can combine only with a verb root as in ka-ca 
‘go-Prop’ and ka-la ‘go-Imper’. This means that verbal suffixes like -ca and -la 
have their own selectional or co-occurrence restrictions in addition to their being 
positioned into the Mood slot. The template alone thus fails to describe all the 
combinatory possibilities, demanding additional mechanisms. In addition, taking 
into consideration other types of verbal elements such as complementizer words 
or subordinator words, more templates are called upon. Leaving aside the issue 
of empty elements when optional suffixes are not realized, a templatic approach 
appears not to properly reflect the morphological structure of Korean inflections 
(cf. Kim 1998). 



3 A Type- Hierarchy Approach 



3.1 Verbal Morphology 



The starting point of structuring the lexicon in the KRG is parts of speech in 
the language. Like the traditional literature, the KRG assumes verbal , nominal , 
adverbial, and adnominal as the language’s basic categories. These are further 
subclassified into subtypes. For example, the type verbal is taken to have the 
hierarchy given in (6): 



(6) verbal 

v-stem 

v-tns-stem v-free 

v-hon-stem v-tns 

v-lxm v-hon v-pst. v-pres v-fut 

aux-v cop-v main-v v-st-pres v-nonst-pres 
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Such a classification aims to capture the basic verbal morphology of Korean. 
In turn, it means a verbal element will be built up step by step, starting from 
v-lxm (v-lexeme) to v-free: 

(7) a. [[[[cap + hi]+si] +ess]+ ta] + ko] ‘catclr-Caus-Hon-Past-Decl-Comp’ 
b. v-lxm — > v-lron (v-lron-stem) — > v-tns (v-tns-stem) — > v-free (v-stem) — > 
v-comp 



Such building processes are constrained by the type declarations, some of which 
are given in (8): 3 



(8) 



a. v-hon: 



ORTH E + si 
v-base 

STEM _ TT ^ 
ORTH E 

SYN. HEAD. HON + 



c. v-free: 



STEM v-tns-stem 
SYN. HEAD. IC bool 



b. v-tns-stem: 



v-hon-stem 



STEM 



SYN E 

SEM.RELS EH 



SYN E 

SEM.RELS 0 © m 



The constraints in (8)a mean that the type v-hon will take v-base as its stem; 
those in (8b) mean that the type v-tns-stem will take an instance of v-hon-stem 
as its stem. One thing to note here is that any subtypes of v-hon-stem can serve 
as the stem of v-tns-stem in accordance with the type hierarchy system. The 
grammar makes only the instances of v-free serve as an input to syntax. 

These constraints restrict the possible word internal structures in Korean 
word formation. The system could provide a clean account for the ill-formed 
combinations without employing mechanisms such as templates. Observe the 
following: 

(9) a. * v—hon — stem [v—tns—stem [cap-uss]-si]-tu ‘catch-Past-Hon-Decl’ 
b. * t ,_/ re e[«-?ion-stem[cap-usi]-ta]-ess ‘catch-Hon-Decl-Past’ 

C. * [v —hon—stem [v — hon— stem [cup-usij-usi]-ess]-ta ‘catch-Hon-Hon-Past-Decl’ 

(9a) is ruled out because the honorific suffix co-occurs with the v-tns-stem, vio- 
lating (8a); (9b) is ill-formed since the passive suffix -ess is attached to the v-free 
stem. This violates the constraint (8b) which requires its stem value be v-hon- 
stem or any of its subtypes. In the same vein, (9c) is not generated because the 
second honorific suffix occurs not with a v-base , but with a v-hon stem. 

One important question arises: why do we need the notions of types in the 
morphological theory? The reason is simply that any morphological theory for 
Korean needs certain notions similar to types. We can find cases where we should 
have some notions referring to a specific group of morphological objects, so as to 
predict that a certain morphological phenomena applies only to this group. As 
noted, only instances of v-free can be pumped up to v-word occurring in syntax. 4 

3 The implemented feature descriptions in the LKB system are slightly different from 
those represented here. 

4 The type v-free is further subtyped into v -ind(ependent), v-dep(endent), and v- 
ger (rundive). Each of these functions as an independent syntactic element, v-ind 
functions as a predicate in the independent clause, v-dep words are used as depen- 
dent verbs such as complementizer or subordinator predicates. 
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Being its subtype, v-sug-infm (v-suggestive-informal) requires its STEM value 
to be v-base as represented in the following: 

(10) a. *v-hon-stem [ilk-usi]-ca ‘read-Hon-Sug’ 
b. „_f, ose [ilk]-ca ‘read-Sug’ 

In a template analysis like (2), this would mean the honorific and tense slots 
should be empty. This would surely make the grammar much complicated. How- 
ever, the present type-based system can efficiently avoid such an issue by simply 
referring to the type v-base as the STEM value of the type v-sug-infm. 



3.2 Nominal Morphology 

Nominal inflection is basically different from verbal inflection. Even though like 
verbal inflections, nominal suffixes are also under tight ordering restrictions, all 
the nominal suffixes are optional as represented in the following template and a 
true example: 

(11) N-base - (Hon) - (PI) - (Postp) - (Conj) - (X-Delim) - (Z-Delim) 



(12) sensayng + (nim) + (tul) + (eykey) + (man) + (un) 
teacher + Hon + PI + Postp + X-Delim + Z-Delim 
‘to the (honorable) teachers only’ 



All the suffixes (often called particles) here, decoding various grammatical func- 
tions, need not be realized. Traditionally particles are treated as independent 
words even though they act more like verbal suffixes in terms of strict ordering 
restrictions, no intervention by any word element, and so forth. Our grammar, 
following lexicalist perspectives (cf. Clro and Sells 1994, Kim 1996), takes a quite 
different approach: we take particles not to exist as independent words but to 
function as optional inflectional suffixes. As a starting point, the KRG sets up 
different types of nominals corresponding to meaningful classes as represented 
in the hierarchy: (13): 



(13) 



nominal 




nom-stem3 nom-xdel n-cmkr n-dmkr 
nom-stcm‘2 nom-conj-p 
nom-steml nom-p 
nom-base nom-pl 



The building process of nominal elements starts from the type nom-base that 
includes subtypes such as vn, n-bn, n-cn, n-cl, n-prop (verbal nouns, bound 
nouns, common nouns, classifiers, proper nouns). Just like the process of building 
verbal elements, nominal word formation observes this hierarchical process: 
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(14) nom-base — > nom-steml — > nom-stem2 — > nom-stem3 — > nom-stem4 — > 

nom-stem 



One crucial difference from the forming process of verbal elements is that any of 
these processes can directly be realized as (pumped up to) a word element in syn- 
tax. 5 The constraints on each type place restrictions on the ordering relationship 
among nominal suffixes: 



(15) 





[orth m + tui 


nom-pl: 


STEM 


nom-base 
ORTH 0 



nom-stem2 



nom-conj-p: 



STEM 



SEM 



INDEX m 
RELS 0 



SEM 



INDEX H 
RELS 0 



nom-p: 



STEM nom-steml 
SYN. HEAD. CASE pease 



nom-zdel: [STEM nom-stem4\ 



These constraints on the nominal types can place ordering restrictions among 
nominal particles: 

(16) a. * [room— stem [sensayngnim-tul-un]-ey key] ‘teaclrer-Pl-Del-Postp’ 

b. * [nom- stems [sensayngnim-tul-kwa]-eykey] ‘teaclrer-Pl-Conj-Postp’ 

c. * [ nom-stem [sensayngnim-tul-un]-i] ‘teaclrer-Pl-Del-Nom’ 



The so-called postposition eykey requires its STEM value to be an instance of 
nom-stem. This explains why (16a) and (16b) are not generated in the system. 
The nominative marker can combine only with nom-stem4 or its subtypes. This 
explains why the system generates cases like (16c). However, it correctly gener- 
ates cases like the following: 

(17) a. „ 0 m-base [sensayngnim]-i ‘teacher-Nom’ 

b. nom-steml [nom-base [sensayngnim]-tul]-kwa ‘teacher-Conj ’ 

As noted, the type hierarchy system allows the STEM value to be any subtypes 
of the originally required one. For example, even though the case marked nominal 
(nom-emkr) element would have its STEM value nom-stem4 , nom-base can also 
satisfy this satisfaction since it is a subtype of nom-stem4- 

In sum, the morphological system we have shown makes the Korean morphol- 
ogy much simpler and can capture the ordering restrictions as well as cooccur- 
rence restrictions. Other welcoming consequences of adopting the typed feature 
system come from the treatment of well-known mixed constructions such as sen- 
tential nominal and light verb constructions. Both of these have received much 
attention because of their mixed properties. 

5 The grammar specifies only v-free to be realized as v-word whereas for nouns it 
permits all the instances of type nominal to be realized as n-word. This in turn 
means any subtype of nominal can serve as a syntactic element in accordance of the 
type hierarchy in (13). 
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4 Multiple Inheritance Hierarchy: Advantages 

One main property of the typed feature system we developed here is that it al- 
lows us to adopt multiple inheritance hierarchies, commonly used in the object- 
oriented programming paradigm to organize multiple dimensions of information 
about objects in particular knowledge domains. In particular, this multiple in- 
heritance system provides a straightforward and efficient method of capturing 
the mixed properties of phenomena such as light verb and nominalizations con- 
structions, both of which are most common phenomena and notorious for their 
syntactic complexities. 



4.1 Nominalization 

One of the main puzzles in the treatment of Korean sentential nominalizations or 
verbal gerundive phrases (VGP) is that they display verbal properties internally 
and nominal properties externally. Internal verbal properties are prevalent. One 
telling piece of evidence comes from the inheritance of arguments from the lexeme 
verb from which the gerundive verb is derived. As shown in (18), the gerundive 
verb takes the same arguments, the nominative subject and accusative object: 

(18) [John-i ecey ku clrayk-ul/*uy 

John-NOM yesterday that book-ACC/*GEN 
ilk-ess-um]-i myonghwak-hata 

read-PAST-Nmlz-Nom clear-do 

‘John’s having read the book yesterday is clear’ 

Various other phenomena also show that such gerundive phrases are inter- 
nally similar to VPs. They can include a sentential adverb as in (19a); an adver- 
bial element can modify the gerundive verb as in (19b); the phrase can include 
the sentential negation marker an as in (19c); it also can contain the full range 
of auxiliaries as in (19d), the phrase allows free scrambling of its elements as in 
(19e): 

(19) a. John-i papokathi ku chayk-ul ilk-ess-um (Sent. Adv) 

John-Nom foolish that book-Acc read-Past-Nmlz 
‘John’s having read the book foolish’ 

b. John-i chayk-ul ppalli/*ppalun ilk-um (Adv Mod) 

John-Nom book-Acc fast(adv)/*fast(adj) read-Nmlz 
‘John’s reading books fast.’ 

c. John-i chayk-ul an ilk-um (Sentential Neg) 

John-Nom book-Acc Neg read-Nmlz 

‘John’s not reading books.’ 

d. John-i chayk-ul ilk-ko siph-um (Aux verb) 

John-Nom book-Acc read-Comp want-Nmlz 
‘John’s wanting to read books’ 

e. ku chayk-ul John-i ilk-ess-um-(i nollapta) (Scrambling) 

book-Acc John-Nom read-Past-Nmlz-Nom surprising 

‘It is surprising that John read the book.’ 
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Meanwhile, its external structure is more like that of NPs. VGPs can appear 
in the canonical NP positions such as subject or object as in (20a) or as a 
postpositional object in (20b). 

(20) a. [ai-ka chayk-ul ilk-um]-i nollapta 

child-Nom book-Acc read-Nmlz-Nom surprising 

‘That child’s reading a book is surprising’ 
b. [Jolrn-i enelrak-ul kongpwulra-m]-eytayhay mollassta 

John-Nom linguistics-Acc study-Nmlz-about not. know 

‘(We) didn’t know about John’s studying linguistics.’ 

These mixed properties of Korean sentential nominalization have provided a 
challenge to syntactic analyses with a strict version of X-bar theory. Various ap- 
proaches (see Malouf 1998 and references cited therein) have been proposed to 
solve this puzzle, but they all have ended up abandoning or modifying fundamen- 
tal theoretical conditions such as endocentricity, lexicalism, and null licensing. 

In the KRG with the multiple inheritance mechanism, the type v-ger is clas- 
sified as the subtype of both v-free and n-steml as represented in the following 
hierarchy. 




Such a cross-classification, allowing multiple inheritance, is also reflected in the 
feature descriptions in the LKB. The following represents a sample source code: 

v-ger := v-free & n-steml & 

[ SYN #syn & [ HEAD . MOD <> ] , 

SEM #sem, 

ARGS < v-tns-stem & [ SYN #syn, SEM #sem ] > ] . 

As observed here, being a subtype of v-free and n-steml implies that v-ger will 
inherit their properties. Since it is a subtype of v-free, v-ger will act just like 
as a verb: selecting arguments and assigning case values to them. In addition, 
v-ger can undergo the same nominal suffixation process since it is a subtype of 
n-steml. For example, the gerundive verb ilk-ess-um will be generated through 
the following informally represented structure in the KRG. 

6 In capturing the mixed properties, the KRG system adopts the binary-valued fea- 
tures VERBAL and NOMINAL. Nominalized verbs are assigned to have [VERBAL 
+] and [NOMINAL +] with the HEAD value verb. Meanwhile, the verbal nouns are 
different form nominalized verbs with respect to the HEAD value: They are noun. 
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( 22 ) 



v-ger 

ORTH [ilk + 



SYN 
SEM ® 



+ um 





verb 


HEAD 


VERBAL + 




NOMINAL + 



STEM 



v-tr 

ORTH ilk 



v-tns-stem. 

ORTH ilk + ess 
SYN S 

|_SEM [3][[2] © past-tense] 



[ORTH 



SYN m 
SEM m 



HEAD 




The gerundive verb starts from a transitive lexeme ilk ‘read’ and forms a v-tns- 
stem after the attachment of the past tense suffix ess. When this v-tns-stem is 
attached with the nominalizer suffix, it inherits [NOMINAL +] feature. As such, 
various verbal properties are inherited from v-tran-lxm whereas the nominal 
properties coming when it attaches to the nominalizer. This is a reflection of 
how information flow occurs in sentential nominalization: 




As can be seen in (20), the ARG-ST information is coming from the left element 
since the nominalized N still needs to combine with the complement (s) of the 
verb, while the categorial information comes from the righthand nominalizer. 

Such a treatment is a clear advantage over previous theoretical or compu- 
tational approaches in which nominalized verbs are simply taken to be either 
verbs or nouns. If they are taken to be verbs, ad hoc mechanisms are required 
to generate nominal suffixed nominalized verbs, causing heavy parsing loads. If 
they are simply taken to be nouns, we could not account for why gerundive verbs 
can be also inflected with tense and honorific and function just like verbs. The 
multiple inheritance system, designed with fine-grained feature declarations, can 
avoid such an issue. 
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4.2 Light Verb Constructions 

As the name implies, VNs (verbal nouns) in Korean also display both nominal 
and verbal properties. The case markings on the VNs and the genitive case 
marking on its argument indicate that they have nominal properties: 

(24) John-i mullihak-uy yonkwu-lul lrayessta 
John-Nom plrysics-Gen study-Acc did 
‘John studied physics.’ 

They also have verbal properties in the sense that they select arguments and 
assign case markings on its arguments independently. 

(25) a. John-i mwullihak-ul yonkwu (cwung) 

John-Nom physics-Acc study (while) 

‘John is in the middle of studying physics.’ 
b. John-i ku ceyphwum-ul mikwuk-eye yelshimhi swuchwul-ul lrayessta 

John-Nom the item-Acc US-Loc diligently export-Acc did 

‘John diligently exported the item to US.’ 

Just like the treatment of gerundive verbs, the multiple inheritance mecha- 
nism plays an important role in capturing the mixed properties. In the KRG, 
verbal nouns are also cross-classified as a subtype of both n-base and verbal. 

vn : = n-base & verbal & 

[ SYN . HEAD . TYPE t-none, 

SEM [ MODE statement, 

INDEX event ] ] . 



This feature description implies that vn, being a subtype of n-base and verbal , 
will inherit their properties. For example, the structure of the VN swuchwul 
‘export’ would be something like the following: 



(26) 



nom-zdel 

ORTH swuchwul-ul 

vn 

ORTH swuchwul 



STEM 



SYN m 



SYN 0 



■ 


noun 


” 


HEAD 


NOMINAL + 






VERBAL + 




L ARG-ST (NP [nom], NP[acc], NP[da£])J 



HEAD [CASE acc] 



As a subtype of n-base, the HEAD feature of the VN will be noun and [NOM- 
INAL +], and as a subtype of verbal, it will also inherit [VERBAL +] feature 
and ARG-ST value. This then would allow the VN to appear in any nominal 
position while internally acting like a verbal element. 
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5 Testing the Feasibility of the System 



The grammar we have built within the typed-feature structure system here, 
eventually aiming at working with real-world data, has been first implemented 
into the LKB. 7 In testing its performance and feasibility, we used the SERI 
Test Suites ’97 after the successful parsing of the self-designed 250 sentences. 
The SERI Test Suites (Sung and Jang 1997), carefully designed to evaluate the 
performance of Korean syntactic parsers, consists of total 472 sentences (292 test 
sentences representing the core phenomena of the language and 180 sentences 
representing different types of predicate). In terms of lexical entries, it has total 
440 lexemes (269 nouns, 125 predicates, 35 adverbs, and 11 determiners) and 
total 1937 word occurrences. As represented in the following table, the testing 
results of the KRG prove quite robust: 





# of Lexemes 


# of Words 


# of Sentences 


SERI 


440 


1937 


472 


KRG Parsing Results 


440 


1937 


423 


Coverage (%) 


100 


100 


89.5 



As the table shows, the system correctly generated all the lexemes in the test 
suites and inflected words. In terms of parsing sentences, the grammar parsed 
423 sentences out of total 472. Failed 49 sentences are related to the grammar 
that the current system has not yet written. For example, the SERI Test Suites 
include examples representing phenomena such as lronorification, coordination, 
and left dislocation of subject. It is believed that once we have a finer-grained 
grammar for these phenomena, the KRG will resolve these remaining sentences. 
Another promising indication of the test is that its mean parse (average number 
of parsed trees) for the 423 parsed sentences marks 1.67, controlling spurious 
ambiguity at a minimum level. 

As noted here, the test results provide clear evidence that the KRG, built 
upon typed feature structure system, offers high performance and can be ex- 
tended to large scale of data. Since the test suites here include most of the main 
issues in analyzing the Korean language, we believe that further tests for des- 
ignated corpus will surely achieve nearly the same result of high performance 
too. 

7 The space does not allow us to explicate the morphological and semantic system 
of the KRG in Korean. As for morphology, we integrated MACH (Morphological 
Analyzer for Contemporary Hangul) developed by Shim and Yang (2002). This sys- 
tem segments words into sequences of morphemes with POS tags and morphological 
information. 

As for semantics, we adopted the Minimal Recursion Semantics developed by 
Copestake et al. (2001). In the multilingual context in which this grammar has been 
developed, a high premium is placed on parallel and consistent semantic represen- 
tations between grammars for different languages. Ensuring this parallelism enables 
the reuse of the same downstream technology, no matter which langauge is used as 
input. The MRS well suits for this purpose. 
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6 Conclusion 

It is hard to deny the fact that in building up an efficient grammar, expressive 
accuracy has often been scarified in order to achieve computational tractability 
(Oepen et al. 2002). However, putting linguistic generalizations aside has brought 
difficulties expanding the coverage and eventually building a large scale of gram- 
mar. To build up any efficient parsing system for languages like Korean which 
displays an intriguing morphological properties, a prerequisite is a system that 
can build up morphological elements in a systematic way and project them into 
syntax and semantics to achieve proper grammatical compatibility. Conventional 
forms of standard morphological representations have proved problematic, nei- 
ther being able to capture linguistic generalizations nor pinning down descriptive 
adequacy. In contrast, the morphological and syntactic system we have devel- 
oped here with typed feature structures solve such preexisting problems while 
keeping linguistic insights, thus making the Korean morphology much simpler 
(e.g., in capturing the ordering restrictions as well as co-occurrence restrictions). 
Other welcoming consequences of the present system come from the treatment 
of well-known mixed constructions such as sentential nominal and light verb 
constructions. Both of these have received much attention because their mixed 
properties and even have been impediments to theoretical as well as compu- 
tational linguistics. We have seen that once we have a rigorously defined type 
feature structure system of grammar, all these fall out naturally with high effi- 
cient parsing performance. 
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Abstract. We have implemented a Japanese text processing system, combining 
the existing parser and dictionary with the linguistic resources that we 
developed based on systemic functional linguistics. In this paper, we explain the 
text understanding algorithm of our system that utilizes the various linguistic 
resources in the Semiotic Base suggested by Halliday. First, we describe the 
structure of the SB and the linguistic resources stored in it. Then, we depict the 
text understanding algorithm using the SB. The process starts with 
morphological and dependency analyses by the non-SFL-based existing parser, 
followed by looking up the dictionary to enrich the input for SFL-based 
analysis. After mapping the pre-processing results onto systemic features, the 
path identification of selected features and unification based on O’Donnell are 
conducted with reference to the linguistic resource represented in the system 
networks. Consequently, we obtain graphological, lexicogrammatical, semantic 
and conceptual annotations of a given text. 



1 Introduction 

The purpose of this research is to implement a natural language processing system 
that follows the theoretical model of systemic functional linguistics (SFL). SFL aims 
at describing a language comprehensively and provides a unified way of modeling 
language use in context [1], While SFL has been used as the basis for many natural 
language generation systems (e.g., [2]), less work has been done for natural language 
understanding systems (e.g., [3]). 

Sugimoto [4] proposed the data structure of the Semiotic Base (SB), which stores 
SFL-based linguistic knowledge in computational form, and investigated how to 
incorporate the SB into a dialogue management model in order to enable an intelligent 
agent system to identify the current dialogue context and behave appropriately 
according to it. By elaborating their idea and combining the existing parser and 
dictionary with systemic resources, we implemented a Japanese text processing 
system that can conduct both understanding and generation of Japanese text. 

In this paper, we explain the text understanding algorithm of our text processing 
system that utilizes the various linguistic resources in the SB. First, we describe the 
structure of the SB and the linguistic resources stored in it. Then, we depict the text 
understanding algorithm using the SB and the outputs of the process. 



A. Gelbukh (Ed.): CICLing 2004, LNCS 2945, pp. 26-37, 2004. 
© Springer-Verlag Berlin Heidelberg 2004 
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2 The Semiotic Base 

According to [4], the SB consists of four main and two subsidiary components as 
shown in Table l. 1 



Table 1 . Structure of the Semiotic Base 



Main 

components 


Context Base (CB) 


Situation Base 


Stage Base 


Concept Repository (CR) 


Meaning Base (MB) 


Wording Base (WB) 


Expression Base (EB) 


Subsidiary 

components 


Machine readable 
dictionary 


General Dictionary (GD) 


Situation-Specific Dictionary 


| Corpus Base (texts with annotations) | 



Following SFL, they in [4] employed distinctive perspectives in their design of the 
SB. One of them is the stratificational organization of a language in context. 
Corresponding to this, the main components are: Context Base (CB), which stores the 
features characterizing a given situation of dialogue and selection constraints on 
semantics specifying which semantic features are relevant to a given situation type; 
Meaning Base (MB), which stores features depicting the meanings associated with a 
situation type and constraints on lexicogrammar specifying which lexicogrammatical 
features are available in realizing a particular meaning in a situation type; Wording 
Base (WB), which stores the features to describe dialogue in terms of Japanese 
lexicogrammar and constraints specifying which graphological features are available 
for realizing a particular lexicogrammatical features in a situation type; and 
Expression Base (EB), which is currently designed to deal with written texts and 
stores graphological features associated with rules to lay out the word list using a 
conventional publication language, e.g., HTML. 

Another characteristic is that the linguistic features in these bases and the 
influences of feature selection on the structure, on the feature of other units in the 
structure and on the relation between the units generated within and across the strata 
and the ranks are represented in the same manner, i.e., as system networks and 
realization statements. Fig. 1 shows an example of a system network and the 
associated realization statements extracted from WB. 

A system network is a directed graph that consists of systems whose terms are 
represented by linguistic features. In each system, only one feature can be selected. 
The selected feature may be an entry condition for other systems. In Fig. 1, 'major- 
clause' is the entry condition for two systems. If this feature is selected, of the first 
system, either 'effective' or 'middle' must be chosen, and of the second system, 
either'material', 'mental', 'verbal-process' or 'relational' must be selected. If 'effective' 
and 'material' are selected, 'mat-doing' is selected. 



1 In the table, the dotted cell indicates the contents that are not relevant to the process 
explained in this paper. 
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major-clause 



effective 



1 



[insert Agent jj 

/ 

- middle 

- material 



mat-doing 
insert Goal 

prcselcet Goal iigrp -participant-head 
or ngrp-partieipant 
conflate Goal Medium 
conflate Actor Agent: 



insert Rheme 

insert Process 

insert Predicate! 

insert Medium 

partition Medium Ihooess 

conflate Process Predicator 






insert Actor 

preselect Actor ngrp-participant-head or ngrp-participant 
preselect Process verbal-group 



mental 



verbal-process 

relational 



Fig. 1 . Fragment of a system network and associated realization statements 



Some features are associated with realization statements, which are used to specify 
instance structures containing these features. We assume that an instance structure is a 
tree whose nodes are called units. A unit consists of a feature selection path of SFL 
features from the root feature of the system network and a set of roles that this unit is 
considered to play for the parent unit. In Fig. 1, 'insert Agent', which is associated 
with 'effective', is a realization statement that means a unit containing 'effective' 
should have a child unit whose role is 'Agent'. 'Preselect Process verbal-group', which 
is associated with 'material', requires that if a unit containing 'material' has a child unit 
whose role is 'Process', this child unit should contain a feature 'verbal-group'. 
'Conflate Goal Medium' associated with 'mat-doing' indicates the child units whose 
role is 'Goal' and 'Medium' must be unified. 

The realization statements are precompiled in what O'Donnell [3] calls partial- 
structure and used in lexicogrammatical and semantic analyses explained below. A 
linking partial-structure represents a possible pattern of parent/child unit pairs, and is 
compiled from a combination of insert, conflate and preselect realization statements 
in the WB and MB networks. An ordering partial-structure specifies a constraint on 
the ordering of child units of a parent/child unit pair, and is compiled from insert, 
conflate and order statements. Fig. 2 represents the partial structures converted from 
the realization statements shown in Fig. 1. 

We divide CB into three sub-components: Situation Base, Stage Base, and Concept 
Repository (CR). 2 In CR we provide the concepts in the form of frame representation, 
associated systemic features and roles, and EDR concept identifier [5]. Table 2 shows 
an example of a CR record. 



2 The contents of CB mentioned in [4] are stored in what we call Situation Base. The contents 
of Stage Base and CR here roughly correspond to interaction plans in Plan Library and 
concept frames in Knowledge Base proposed in [4], We assume that these are part of 
linguistic knowledge, hence we include them in the SB. 
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link_Actor_l 




1 ink_Actor_2 



|(:and wb clause clause-simplex major-clause material) 

| Actor 

(:and wb group-phrase groups nominal-group 
(:or (:and ngrp-simplex nominal-head 
ngrp-participant-head) 

(:and ngrp-complex ngrp-hypotactic 
ngrp-participant) ) ) 



link_Goal_3 




link_Process_4 



|(:and wb clause clause-simplex major-clause) ~ 

| Process/Predicator 
|(:and wb group-phrase groups verbal-group) ~ 

order_Medium#Process_5 



|(:and wb clause clause-simplex major-clause) | 

Medium#Process/Predicator 

Fig. 2. Examples of partial-structures 

The records in CR are sorted according to a situation type to which a given concept is 
relevant. In this sense, CR is different from EDR concept dictionary, which is 
designed to serve as a general taxonomy. 

In addition to the main bases, the SB accommodates Corpus Base and a machine- 
readable dictionary. We provide two types of machine-readable dictionary: General 
Dictionary (GD) and Situation-Specific Dictionary. Both store ordinary dictionary 
information on lexical items, associated systemic features and roles, and EDR concept 
identifier and concept relation label. Table 3 shows an example of GD record. 
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Table 2. Example of Concept Repository record 



Head Concept Name 


writing 


Concept Type 


class 


EDR Concept Identifier 


0fe07c 


MB Features 


fg-creative 


WB Features 


creative 


Upper Concept Name 


domain -action 




Slot Name 


Slot Value Type 


SFL Role 


1 


agent 


agent 


Actor 


2 


object 


document 


Goal 


3 


instrument 


word-processor 


Means 



Table 3. Example of General Dictionary record 



Headword 


*< "writing" 


Kana 


tlf? "kaku" 


EDR Part of Speech 


JVE (i.e., verb) 


EDR Concept Identifier 


0fe07c 


MB Features 


fg-creative 


WB Features 


creative&lg-concrete or creative&lg-abstract 


SFL Roles for Headword 


Event 




EDR Concept Relation Label 


SFL Roles 


1 


agent 


Actor&Agent 


2 


object 


Goal&Medium 


3 


instrument 


Means 



3 Text Understanding Algorithms 

Fig. 3 shows the flow of the text understanding process with the SB. In this section, 
we briefly explain each phase in this diagram using an example output of our system 
shown partly in Table 4. 



3.1 Graphological Analysis 

In this phase, the graphological instance structure of the text is constructed referring 
to EB. All that the current system does is to recognize sentence boundary based on 
punctuation. This process is independent of the other processes. 



3.2 Preprocessing 

After the morphological analysis and the dependency structure analyses [6] and the 
GD lookup are done, EDR concept identifiers are assigned to each word segment and 
EDR concept relation labels to each dependency pair of bunsetsu (i.e., phrase) 
segments by looking up the EDR dictionary of selectional restrictions for Japanese 
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Table 4. Example of the output of text understanding process (in part) 




exit daue-^lixaajtt-ihiM aitytod'iixluK sn-cccicct a-ruurd :t-sptc£:c ta-egptarfl :c-raLtc:mal ci-zzaz a-aizi rraiiri. maJ-decflj ttfaiv: Jg-urcrilr e£:caw 
tfiirrubc-u irt-fli'ijict opcaiw-wiie iwnifi: idaivi-lbare jcairc-tece ::r.oJ-lcpi:iJ-i^ir&te :l;-p«h" d^n-MaJtj-nfeeinjlsfc -espkn! r^i-clet ar.irf. md-r^n-adiie;sie 

&>sprakt icn-jJtt>at-:pb:c brn-Jiitai 1 . 1 : u-oy.ilR id^rd . 

G :i> 'AtiM Comphniitti' ?Ji:nk1 | - - 

rgr c;cpl:~rgt' .{ojir.cnnjf- dv:ta:li: rgrp matt. . 

ncr.tfrt 2 .NT Sie.PT.pi Quifia/Mciiierl Thnj'Hei^ Meaii'AdjHKb'Sht^i Prcc:H'’Pr!-ji:a:«/Piae3 

:a.Thtm; 



Prcc!£j' Pnd:ar:tf Biine 3 



njr-anp I::-: spccr ietcnzi- 
:acve KrrimJ-h!aJv.tnuJi:- 
nj? Mfscnd More mini r$> 
pairwri-hraJ grn:riJ-tfi:m:- 
rrarfctf »:ier njr-oart-j’ 
m-i'Mv. 

~ 1 



^p-yirrtsan-wccjc 

:cNifa.N3>a?rra:tKn- 

■daiJk-N-jpngf'-ftlliir 

icm-spaiikiicn-faHirc. 



rfp-sirt?js i>:fl-’icii:-N3p nccri-berc 
rcn-f e:ct n^-jjtKirjr-btad rcrrua. 
agp-rjX' Njpc-:oDcr: qtaEd 



i$?-suf b:nor.-spe:fi: nxjial 
Ngjp<«cr: ncn-\xn 2 l]:-r$p 
rcmutal-hia J i^p-rmiala^c- 
h:aJ^p-:ir-<ie 



7gr?-5JTf4t:< j^p-pcjh'.'c 
trmocd Kn<«t zeal 
nodJ’ia rei/i:sj-ii:b)jfcfl 
«r ?Jnt cptirc-ta rcn- 
cyjj’Jnt a:trrt-7?£: 



loifcr. 


fling Here'. 


Knnd- 

mukc' 

Kvik: 


Trap' Hradl 


Ncnuid- - , Tcit 1 

^ Mccainf 

fc: “* Modfcrl 


j-* 7 osii- 

rrrrr-b 


cnrocn-icinf pr has:i- 
godfialKC . 


-C3SC- 

zzha cas:- 

5 . 


c:aiKn- 
ait-fpa: . 


j-:ase-zrkn laki-veib judry-vtrb 
cut-ce . air-b. 


1 

sue . 


Head! loifierl 

base . . sics . 


1 

Vat .. 


1 

sat . 


1 : 1 
bat . . fca?: . bae . . 




.“.:kn Aged 

limert ph-puttip'jr surf Ir-lhng zk- 

CXiClOTS . 



mh-rccds-aid-avxis tsicri cfc-ieundn: d&ifcj! . 






Sq-:l:fMd £C-m£HISl JIJpfOJiCMl £C-KCCH; 



: Quafcn'Mcccerl 

lusdb- , ch<‘Ut:D , Jt <crl:-tte 2 cfc-r>:n-:cmo:u> ... 

x-p artefact ira:r:»-thng , . ' pn-:rxuirjtartr pi 

=b-=ar!ri-55i:c 



vt\d-a:b:ci ijpt’itf = u?:r. hirer = :v’:m. ceded = uch 
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verbs, the Japanese co-occurrence dictionary and the concept dictionary. Some word 
sense ambiguities may also be resolved in this phase. For instance, there are four 
records whose headword is “kaku” in GD. Each record has different concept 
identifier. After looking up EDR dictionary of selectional restrictions for Japanese 
verbs with reference to the concept identifier given to the verb record, we obtain 
specification of selectional restriction. By checking the consistency of the selectional 
restriction with the concept identifiers assigned to each word in the input sentence, we 
can reduce the four to one. Then, candidates of the SFL roles of each segment are 
obtained from the concept relation labels by referring to the mapping table in the GD 
record. For example, the 'implement' concept relation label is assigned to the fourth 
segment "waapuro-de" by looking up the concept dictionary, and this segment is 
identified to play 'Means' role for the fifth segment "kaki-tai" on which it depends. 



3.3 Lexicogrammatical Analysis 

The goal of this phase is the construction of the lexicogrammatical instance structure 
of the text. A lexicogrammatical instance structure is constructed by referring to the 
WB resource, i.e., the system networks and the realization statements. Our method is 
based on O'Donnell's idea realized in his WAG systemic parser [3], which uses data 
structures called partial-structures and a bottom-up chart parsing strategy. The phase 
can be divided into five steps as indicated in Fig. 4. We will explain each step in the 
following subsections. 




Fig. 4. Flow of lexicogrammatical analysis 



3.3.1 Construction of Morpheme Rank Units 

The first step is to construct morpheme rank units based on the result of the 
preprocessing. Of the lexicogrammatical features drawn from the preprocessing, the 
features that are located at the morpheme rank system network are assigned to each 
morpheme unit. For example, the fifth morpheme unit "hookoku" and the sixth "syo" 
are given WB feature "base" and "suffix" respectively. 



3.3.2 Extension of the Constructed Instance Structure 

The second step involves looking up the precompiled linking partial-structures 
explained in Section 2 and collecting the partial-structures that have compatible 
selection paths with the root unit of the constructed structures. We can find the 
partial-structures whose child unit contains "base" or "suffix" as shown below. 
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|(:and wb morpheme base) |(:and wb morpheme suffix) 

" hookoku " " syo " 

Fig. 5. Morpheme rank instance structures 



1 ink_Head 



(:and wb word word-simplex) 
| Head 

|(:and wb morpheme base) 



link_Modif ier 



(:and wb word word-simplex 

hasei-go suffixation) 

| Modifier 



|(:and wb morpheme suffix) 



Fig. 6. Linking partial-structures for “base” and “suffix” 



Fig. 7 shows the result of unification of the morpheme units into each of these linking 
partial-structures. 




|(:and wb morpheme base) | |(:and wb morpheme suffix) 



hookoku " " syo " 

Fig. 7. Two morpheme and word rank instance structures 



3.3.3 Unification of the Root Units of the Constructed Instance Structures 

The third step is to attempt to unify the root units of the constructed instance 
structures. As indicated in Fig. 7, the parent units of "hookoku" and "syo" have 
compatible selection paths. Thus, we can unify these units, and this new unit 
corresponds to a word "hookokusyo". The units that have succeeded in unification are 
incorporated into the parse-chart as an active edge. 



|(:and wb word word-simplex hasei-go suffixation) 

“ i - 

| Head [ Modifier 

I (rand wb morpheme base) I I (rand wb morpheme suffix) I 

" hookoku " " syo " 

Fig. 8. One unified morpheme and word rank instance structure 

3.3.4 Checking Completeness of an Active Edge 

The fourth step involves looking up the precompiled ordering partial-structures 
explained in Section 2 and collecting the partial structures that contain the same 
features with the root unit of the active edge. Suppose the instance structure shown in 
Fig. 8 is currently in the chart as an active edge. In order to see whether this is 
completed or not, the following ordering partial-structure, which has the same 
selection path with "hookokusyo" unit, is relevant. 







34 



N. Ito, T. Sugimoto, and M. Sugeno 



order_Head#Modi f ier 



|(:and wb word word-simplex hasei-go suffixation) 

Head # Modifier 

Fig. 9. Ordering partial-structure 

This tells us that a given unit can have two child units, the one has Head role and the 
other has Modifier role, and the Head unit precedes the Modifier. As shown in Fig. 8, 
"hookokusyo" unit has two child units that meet such role restrictions. Therefore, the 
active edge in question can be regarded as completed. Then, this is incorporated into 
the chart as a passive edge and the fifth step is conducted. 

If a given active edge is not completed, it needs to wait for other units to unify. The 
process starts reading the next morpheme boundary and constructing a new 
morpheme unit. As each morpheme unit is incorporated into the parse-chart, the 
parser then moves on to incorporate the next morpheme unit, until all morpheme units 
are incorporated. 

3.3.5 Addition of Lexicogrammatical Features to a Passive Edge 

The fifth step is to add lexicogrammatical features to the root unit of a passive edge 
by referring to the result of the preprocessing. Suppose the instance structure shown 
in Fig. 8 is currently in the chart as a passive edge. By referring to the result of GD 
lookup, we can add word rank features such as “common-noun” to the root unit 
“hookokusyo.” Then, we go back to the second step, searching linking partial- 
structures whose child unit has compatible selection paths with the root of the passive 
edge to unify the edge with appropriate linking partial-structures. 

Using these data structures and a unification algorithm for units, an instance 
structure is constructed from morpheme rank units in a bottom-up manner. 



3.4 Semantic Analysis 

In this phase, the semantic instance structure of the text is constructed referring to the 
lexicogrammatical instance structure and the semantic features of the words in the 
text. The parent/child relation of the semantic units is augmented and the consistency 
among them is verified in the similar way as in the lexicogrammatical analysis. 

The current version of the implementation deals with ideational semantic features 
that characterize a proposition of a text [7], interpersonal semantic features that 
characterize a speech act [8], and textual features that characterize a rhetorical 
structure [9]. 



3.5 Conceptual Analysis 

Finally, the conceptual analysis is conducted to create an instance concept frame 
representing the conceptual content of the input text. Slots of the instance concept are 
filled recursively by other instance concepts, which correspond to child segments of 
the text. Type constraints on slot fillers are checked according to both the class 
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hierarchy of the concept repository and the EDR concept classification hierarchy that 
has richer contents for general concepts. 



4 Discussion 

SFL has been used successfully in text generation systems, e.g., [2, 10]. On the other 
hand, only several SFL-based text understanding systems have been developed. Most 
of them deal with small fragments of the whole theory due to implementation 
constraints [11, 12], Others are approaches where systemic description of language is 
converted into the representation proposed by other grammatical theory such as 
HPSG [13, 14]. It does not seem that these systems fully utilize the theoretical 
features of SFL. One exception is WAG system [3], where a text is analyzed directly 
using SFL resources carefully compiled to cope with efficiency problems. 

We extend the parsing method described in [3] in several significant ways. In 
particular, we incorporate the results of the preprocessing phase to add information to 
instance structures and filter out implausible interpretations. When a new inactive 
edge is added to the chart, an import of word information into the root unit of the edge 
is attempted. When an attempt to create a larger active edge by unifying the root unit 
of an inactive edge with a unit contained in a partial-structure or an active edge, 
unifications of feature/role information and a verification of consistency between the 
current instance structure and the dependency structure identified in the preprocessing 
phase, as well as the standard unification of feature selection paths [15] are 
performed. For example, as for nominal groups with a case particle "de", four linking 
partial-structures with role 'Means', 'Quality', 'TemporalLocation' and 
'SpatialLocation' are compiled from the current WB network. However, only the first 
one is consistent with the preprocessing result and is used to construct the instance 
structure for this sentence. 

We introduce inter-stratal linking partial-structures that declaratively represent the 
relationship between lexicogrammatical units and semantic ones. They are used to 
map instance structures from the lexicogrammatical stratum to the semantic stratum. 
Fig. 9 represents an example of inter-stratal linking partial-structures. 

wblink_l 




Fig. 10. Example of inter-stratal linking partial-structures 



Moreover, we incorporate a forward chaining style of inferences to deal with co- 
selection constraints [2], a gate having only one feature and default feature selections. 
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5 Conclusion 

The SB and the text understanding algorithm illustrated in this paper have been 
implemented in Java, and the results of the analyses are output as XML annotations 
on the input texts. The current version of the SB has approximately 700 systems, 1600 
features, 1100 realization statements, 130 records in CR, and 70 records in GD. The 
system can analyze a nominal group complex that consists of more than one nominal 
group combined by particle “no,” like “syuttyoo-no hookokusyo-o”, a clause where 
obligatory unit, i.e., “watasi wa” in “watasi-wa syuttyoo-no hookokusyo-o waapuro- 
de kaki-tai,” is elliptical, and a clause that contains optional unit “waapuro-de” in 
“watasi-wa syuttyoo-no hookokusyo-o waapuro-de kaki-tai.” We have also 
implemented text generation system that utilizes the resources in the SB, and this 
assures us that the resources are reusable [16]. 

We extend O’Donnell’s idea by adding a method for unifying the results of non- 
SFL-based NLP tools, and this enables us to deal with Japanese text. Our system can 
be regarded as a hybrid parser. Adopting SFL as the basis for the system enables us to 
deal with a wider range of language for linguistic analysis. By combining the existing 
parser and dictionary with the systemic resource, we aim at reducing the cost for 
system development and keeping the standard accuracy of the analysis. 

We remark on limitations of the current work and future works. Regarding 
lexicogrammatical analysis, the resource for rankshift and grammatical metaphor is 
under construction, so the system may not construct appropriate instance structures of 
an input text with adnominal clause, embedded clause or nominalization. As for 
contextual analysis, the system can deal with concepts manifested in a text referring 
to CR. Algorithm for inferring contextual configuration from linguistic behavior with 
reference to the entire SB has been under design. These points will be resolved in 
future works. 
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Abstract. Grammatical Framework (GF) [5] is a grammar formalism 
for describing formal and natural languages. An application grammar 
in GF is usually written for a restricted language domain, e.g. to map 
a formal language to a natural language. A resource grammar, on the 
other hand, aims at a complete description of a natural languages. The 
language-independent grammar API (Application Programmer’s Inter- 
face) allows the user of a resource grammar to build application gram- 
mars in the same way as a programmer writes programs using a stan- 
dard library. In an ongoing project, we have developed an API suitable 
for technical language, and implemented it for English, Finnish, French, 
German, Italian, Russian, and Swedish. This paper gives an outline of 
the project using Russian as an example. 



1 The GF Resource Grammar Library 

The Grammatical Framework (GF) is a grammar formalism based on type theory 
[5]. GF grammars can be considered as programs written in the GF grammar 
language, which can be compiled by the GF program. Just as with ordinary 
programming languages, the efficiency of programming labor can be increased by 
reusing previously written code. For that purpose standard libraries are usually 
used. To use the library a programmer only needs to know the type signatures 
of the library functions. Implementation details are hidden from the user. 

The GF resource grammar library [4] is aimed to serve as a standard library 
for the GF grammar language. It aims at fairly complete descriptions of different 
natural languages, starting from the perspective of linguistics structure rather 
the logical structure of applications. The current coverage is comparable with, 
but still smaller than, the Core Language Engine (CLE) project [2]. 

Since GF is a multilingual system the library structure has an additional 
dimension for different languages. Each language has its own layer, produced by 
visible to the linguist grammarian. What is visible to the application grammarian 
is a an API (Application Programmer’s Interface), which abstracts away from 
linguistic details and is therefore, to a large extent, language-independent. The 
module structure of a resource grammar layer corresponding to one language is 
shown in Fig. 1. Arrows indicate the dependencies among the modules. 

A. Gelbukh (Ed.): CICLing 2004, LNCS 2945, pp. 38—41, 2004. 
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Fig. 1 . The resource grammar structure (main modules). One language layer. Shad- 
owed boxes represent high-level of interface modules. White boxes represent low-level 
or implementation modules. Arrows show the dependencies. 



The Russian grammar was written after grammars for English, Swedish, 
French and German. The language-independent modules, defining the coverage 
of the resource library, were therefore ready. The task was to instantiate these 
modules for Russian. As a reference for Russian language, [3,6,7] were used. 

2 An Example: Arithmetic Grammar 

Here we consider some fragments from a simple arithmetic grammar written 
using the Russian resource grammar library, which allows us to construct state- 
ments like one is even or the product of zero and one equals zero. 

The abstract part describes the meaning captured in this arithmetic gram- 
mar. This is done by defining some categories and functions: 

cat 

Prop ; — proposition 

Dom ; — domain of quantification 

Elem Dom ; — individual element of a domain 

fun 

zero : Elem Nat ; — zero constructor 

Even : Elem Nat -> Prop ; — evenness predicate 

EqNat : (m,n : Elem Nat) -> Prop ; — equality predicate 

prod : (m,n : Elem Nat) -> Elem Nat ; — product function 

To linearize the semantic categories and functions of the application grammar, 
we use grammatical categories and functions from the resource grammar: 

lincat 

Dom = N ; — Common Noun category 
Prop = S ; — Sentence category 
Elem = NP ; — Noun Phrase category 
lin 
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zero = DefOneNP (UseN nol) ; 

Even = predAl (AdjPl (adjlStar "ueTH")); 

EqNat = predV2 ravnjatsja ; 

prod = appFunColl (funGen proizvedenie) ; 



Some of the functions — nol. ravnjatsja, and proizvedenie — are lexical enti- 
ties defined in the resource, ready with their inflectional forms ((which can mean 
dozens of forms in Russian), gender, etc. The application grammarian just has 
to pick the right ones. Some other functions — adj IStar — are lexical inflection 
patterns. To use them, one has to provide the word stem and choose the correct 
pattern. 

The rest of the functions are from the language-independent API. Here are 
their type signatures: 



AdjPl : Adj 1 -> AP ; 
predAl : AP -> VP ; 
DefOneNP : CN -> NP ; 
UseN : N -> CN ; 
appFamColl : Fun -> NP - 
predV2 : V2 -> NP -> NP 



> NP -> NP 
-> NP -> S 



— adjective from lexicon 

— adjectival predication 

— singular definite phrase 

— noun from lexicon 

— collective function appl 

— two-place verb predic 



The user of the library has to be familiar with notions of constituency, but not 
with linguistic details such as inflection, agreement, and word order. 

Writing even a small grammar in inflectionally rich language like Russian 
requires a lot of work on morphology. This is the part where using the resource 
grammar library really helps to speed up, since the resource functions for adding 
new lexical entries are relatively easy to use. 

Syntactic rules are more tricky and require fair knowledge of the type system 
used. However, they heighten the level of the code written by using only function 
application. The resource style is also less error prone, since the correctness of 
the library functions is presupposed. 

Using the resource grammar API, an application grammar can be imple- 
mented for different languages in a similar manner, since there is a shared 
language-independent API part and also because the libraries for different lan- 
guages have similar structures. Often the same API functions can be used in 
different languages; but it may also happen that e.g. adjectival predication in 
one language is replaced by verbal predication in another. 

Fig. 2 shows a simple theorem proof constructed by using the arithmetic 
grammars for Russian and English. The example was built with help of GF 
Syntax Editor [1]. 



3 Conclusion 

A library of resource grammars is essential for a wider use of GF. In a gram- 
mar formalism, libraries are even more important than in a general-purpose 
programming language, since writing grammars for natural languages is such a 
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/* TeopeMa . /Uns JiK>6oro skicna x,x- MembiM mjim x - HeseTHbiM . /}0Ka3aTejibCTB0 . 
floKa3aTenbCTBO no MHflyKLjMM. Ea3MC, ComacHO nepBOM aKCMOMe seTHOCTM , Honb - sen-ioe mmcjio . 
TeM bojiee , Honb - MembiM mjim Honb - HeseTHbiPi . LUar MHflyKHMM, paccMOTpMM hmcjio x n 
npeflnojio>KMM x - seTHbiPi mjim x- HesembiPi ( h ) . h . Bo3mo>kho flBa cnysas . nepBbiM cny-iaPi, 
AonycTHM x- HeTHbiM ( a ) . a . ComacHO btopom aKCMOMe MemocTM, mmcjio, cneflynoii^ee 3a x- 
HeMOTHO© . TeM bojiee , mmcjio , cneflywu^ee 3a x- seTHoe mjim hmcjio , cjieflyiomee 3a x- HeseTHoe 
Btopom caysaPi, flonycTMM x - He^en-ibiM ( b ) . b . ComacHO TpeTbePi axcuoMe seiHocTM, smcjio, 
cneflyK)L 4 ee 3a x- seTHoe . TeM bonee , sucno , cneflyK>mee 3a x- MeTHoe mjim mmcjio , cjieflyHDLMee 
3a x - HeseTHoe . T.o. hmcdo , cneflyfOLi^ee 3a x - sen-ioe mjim hmcjio , cjieflyK)L4ee 3a x - Hesen-ioe B 
o6omx cjiysasx. CjieflOBaTejibHo, ajih Bcex MMceji x , x - sembiM mjim x - HeseTHbiPi . 7 

************ 

Theorem. For all numbers x, x is even or x is odd. 

Proof. We proceed by induction. For the basis, by the first axiom of evenness, zero is even. A 
fortiori, zero is even or zero is odd. For the induction step, consider a number x and assume x is even 
or x is odd ( h ). By the hypothesis h, x is even or x is odd. There are two possibilities. First, assume x 
is even ( a ). By the hypothesis a. x is even. By the second axiom of evenness, the successor of x is 
odd. A fortiori, the successor of x is even or the successor of x is odd. Second, assume x is odd ( b ). 
By the hypothesis b. xis odd. By the third axiom of evenness, the successor of xis even. A fortiori, the 
successor of x is even or the successor of x is odd. Thus the successor of x is even or the successor 
of x is odd in both cases Hence, for all numbers x, x is even or x is odd. 

Text 



Fig. 2. Example of a theorem proof constructed using arithmetic grammars in Russian 
and English. 



special kind of programming: it is easier to find a programmer who knows how 
to write a sorting algorithm than one who knows how to write a grammar for 
Russian relative clauses. To make GF widely used outside the latter group of 
programmers, resource grammars have to be created. Experience has shown that 
resource grammars for seemingly very different languages can share an API by 
which different grammars can be accessed in the same way. As a part of future 
work on the resource libraries, it remains to see how much divergent extensions 
of the common API are needed for different languages. 
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Abstract. Analyzing the semantic representations of 10000 Chinese sentences 
and describing a new sentence analysis method that evaluates semantic prefer- 
ence knowledge, we create a model of semantic representation analysis based 
on the correspondence between lexical meanings and conceptual structures, and 
relations that underlie those lexical meanings. We also propose a semantical ar- 
gument-head relation that combines ‘basic conceptual structure’ and 'Head- 
Driven Principle'. With this framework which is different from Fillmore’s case 
theory (1968) and HPSG among other, we can successfully disambiguate some 
troublesome sentences, and minimize the redundancy in language knowledge 
description for natural language processing. 



1 Introduction 

To enable computer-based analysis of Chinese sentences in natural language texts we 
have developed a semantic framework, using the English language framework created 
by C. Fillmore et al. at UC Berkeley as a starting point. The theoretical framework 
developed in this paper is different from other syntactic and semantic frameworks 
(e.g. Case Grammer and HPSG). First, those syntactic and semantic frameworks in 
the literature are either purely syntactic or purely semantic. Our framework is largely 
a semantic one, but it has adopted some crucial principles of syntactic analysis in the 
semantic structure analysis. Secondly, some crucial semantic relationships as exempli- 
fied in (1) below are reasonably represented which are often neglected in Case 
Grammar and HPSP. Third, our proposal is based mainly on our own practical large 
scale analysis of Chinese data. We are planning to apply the same framework to ana- 
lyze other languages. The overall goal is to offer for each natural sentence a repre- 
sentation of semantic relation labeling. 



2 Semantic Relation Labeling 



This workflow includes linking and manual labeling of each relation between direct 
semantic units in single sentences, which reflects different semantic representation of 
the potential realization patterns identified in the formula, and descriptions of the re- 
lations of each frame’s basic conceptual structure in terms of semantic actions. For 
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example, the direct relationships of different semantic units in sentence (1) below can 
be labeled as follows: 



(1) Ta xiao tong-le duzi 
He laugh painful -ASP belly 
‘He laughed so much that his belly was painful.’ 




Within the Case Grammar model, the main verb ‘laugh’ will be taken as the core se- 
mantic unit and all other noun units are directly associated with this verb. Note an 
analysis in that framework mistakenly neglects the immediate relationship between 
‘he’ as a possessor and ‘belly’ as a possession and that between ‘belly’ as entity and 
‘painful’ as description. Our approach clearly recognizes those relationships while the 
central nature of the verb is also specified. 

Link rule 1: Direct Relations Determination. The basic link is the direct link be- 
tween two semantic units. In addition, a set of general rules for determining the direct 
relations has been identified. There are summarized into three major conditions. 1. A 
case of direct relationship between head and its modifier; 2. A case of direct relation- 
ship between an action verb and its patient; 3. Other cases of direct relationships. 

Link rule 2: ‘Head’ Determination. We have proposed an approach that combines 
‘basic conceptual structure’ and ‘Head-Driven Principle’. By ‘Head-Driven Principle’, 
most structures are analyzed as having a ‘Head’ modified. The exceptions are ‘Sub- 
ject-Predicate Structure’ and ‘Verb-Object Structure’. Employing the ‘Head-Driven 
Principle’ for the construction of semantic model, some ambiguous sentences can be 
clearly represented. 



3 Feature Labeling 



Based on the analysis of semantic relationships, we have been parsing feature struc- 
tures to express dependencies between semantic features. To avoid the confusion of 
feature classification, we use the features directly included in the sentences. By ab- 
stracting, we take the features exemplified in sentences directly as semantic features 
that link different semantic units in those sentences. For example: 



(3) Ta gezi bu gao. 

His stature isn’t tall. 
He isn’t tall. 
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In traditional analysis, ‘stature’ is just a syntactic constituent in a sentence. However, 
the essential meaning of the sentence is ‘he is not tali’, ‘stature’ is semantic fea- 
ture linking ‘he’ and ‘tall’ together, thus in our semantic analysis we link only ‘he’ and 
‘tall’ semantically, ‘stature’ is taken as feature marking a semantic relationship, rather 
than an immediate constituent. This Chinese semantic structure, after feature abstrac- 
tion, is very similar to its English counterpart. It facilitates the translation from one 
language into another. 



4 The Advantages of Our Semantic Model 

In developing our semantic tree bank, we also have articulated a framework of ‘Noun- 
Centrality’ as a supplement to the widely assumed ‘Verb-Centrality’ practice. We can 
successfully disambiguate some troublesome sentences, and minimize the redundancy 
in language knowledge description for natural language processing. We automatically 
learn a simpler, less redundant representation of the same information. First, one se- 
mantic structure may correspond to more syntactic structures in Chinese, and this cor- 
respondence can be made specifically clear using our approach. 

(4) Ta da po-le beizi (5) Ta ba beizi da po-le (6) Beizi BEI Ta da po-le 
She broke up the cup She BA cup broke up cup BEI she broke up 

‘She broke up the cup.’ ‘She broke up the cup.’ ‘The cup has been broken 

up by her.’ 

The syntactic structures of the above three sentences are clearly different from each 
other. But they nevertheless share the same basic semantic structure: ‘he’ is the 
AGENT, ‘cup’ is the PATIENT, and ‘break up’ is the ACTION verb. 

On the other hand, one syntactic structure may correspond to two or more se- 
mantic structures, that is, various forms of structural ambiguity are widely observed in 
Chinese, Disregarding the semantic types will cause syntactic ambiguity. If this type 
of information is not available during parsing, important clues will be missing, and 
loss of accuracy will result. Consider (5) below. 

(5) Ta de yifu zuo de piaoliang. 

Her cloth do DE beautiful 

Reading 1: ‘She has made the cloth beautifully 

Reading 2: (Somebody) has made her cloth beautifully.’ 

Syntactically, the sentence, with either one of the above two semantic interpretations, 
should be analyzed having ‘her cloth’ as a subject, ‘do’ as a verb, and ‘beautiful’ as a 
complement. But the two semantic structures have to be properly represented in a 
semantics-oriented Treebank. Under our proposal, the above two different types of 
semantic relations can be clearly represented as follows. 
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5 Conclusion 

We have demonstrated several key advantages of our semantic model, which are: a) 
many ambiguous sentences can be clearly represented, b) minimal redundancy in lan- 
guage knowledge description for natural language processing. 
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Abstract. One major goal of human computer interfaces is to simplify the com- 
munication task. Traditionally, users have been restricted to the language of com- 
puters for this task. With the emerging of the graphical and multimodal interfaces 
the effort required for working with a computer is decreasing. However, the prob- 
lem of communication is still present, and users continue caring about the commu- 
nication task when they deal with a computer. Our work focuses on improving the 
communication between the human and the computer. This paper presents the 
foundations of a multimodal dialog model based on a modal logic, which inte- 
grates the speech and the action under the same framework. 

Keywords: Human computer spoken interaction, speech acts, multimodal interac- 
tion, and modal logic. 



1 Introduction 

The first dialog systems used the speech as the unique communication channel. How- 
ever, the human communication is strongly multimodal. The lips movement, the facial 
expressions, and the gestures are all of them key elements in the human interchange 
of information. 

Current multimodal dialog systems attempt to integrate several communication 
modalities along with the speech. The construction of this kind of systems is a com- 
plex task [8, 10, 13]. It considers several problems such as: speech recognition, natu- 
ral language understanding, knowledge representation, fusion of the different input 
modalities in a coherent message, the definition of a dialog model and others. 

This paper focuses on the definition of a dialog model. It presents the foundations 
of a multimodal dialog model based on a modal logic, which represent the rules of the 
conversation and integrates in the same framework the direct actions (those accom- 
plished with a device of direct designation such as the mouse) and the spoken ones 
(those orally requested by the user to the machine). This consideration is of great 
relevance because the spoken actions are not performed immediately such as the 
direct ones. Thus, the evolution of a spoken action must be controlled during the 
dialog: from the moment it is proposed until the time it is satisfied. 

The proposed model is supported by the theory of speech acts [1, 12], It is based on 
the hypothesis that the dialog is conduced by the mental states that maintain the be- 
liefs, desires and intentions of the user. Nevertheless, this model does not attempt to 
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set a human behavior to the machine, but only to give it the logical elements to hold 
its actions [2]. The idea of treating the speech as an action is not original (see for 
instance [3-6, 9]), however our logic implements a general mechanism in which the 
spoken action is controlled over all the dialogic interchange. This way, it models the 
convergence of a cooperative dialog [18]. 

The paper is organized as follows. Section 2 presents the complete logic frame- 
work. Section 3 shows a short but illustrative example of a dialog conduced by the 
proposed logic. Finally, Section 4 discusses our conclusions and future work. 



2 A Logic for the Dialog 

This section presents the basis of a logic that models the information interchange 
between a user and a machine. This logic, inspired by previous works [2, 4, 7, 11, 14- 
17], proposes the integration of the dialog acts in a framework based on the action. It 
contains elements of an epistemological logic - to represent the knowledge, a dy- 
namic logic - to describe the action and its effects, and a dialogical logic - to repre- 
sent the obligations and intentions expressed during the dialog. 

2.1 Basic Concepts 

This subsection defines the three basic concepts of our logic: knowledge, action and 
intention. 

The knowledge is represented by the operator s (to know). For instance, the for- 
mula Us (p expresses that the user knows the proposition (p (to make the distinction 
between the user and the machine, the two possible agents of our logic, we used the 
letter U for the user and M for the machine). 

In order to represent the action, we introduce the notion of an event. An event Uf a 
is the achievement of the action a by the user (or Mf a in the case of the machine), 
which has the proposition tp as result. Using the notation of a dynamic logic we have 
the following formula [Ufa] tp. This formula indicates that after the execution of a 
by the user, tp is true. An action can be a base action, i.e., an elemental instruction, or 
even a task, i.e,, a sequence of actions organized by a plan. 

The intention is represented by the operator i. Only the user is capable of having 
intentions, thus the formula Ui tp expresses the intention of the user to make <p true. 

2.2 Dialog Acts 

A dialog act is an action causing a change in, on one hand, the task or the machine 
knowledge about the task, and on the other hand, the dialog itself. Hence, a dialog act 
is defined as an event [Uf a]( cpa/\ <p,{ ), where the result is the set of changes related 
with the task q>a and with the dialog <Pd- 

The changes related with the task are the own effects of the action (cpd), while the 
changes related with the dialog <p d shows the progress in the dialog state after each 
interchange of information. In our case, the goal of any dialog is to complete the task 
intended by the user. This way, each dialog act produces an effect over the goal (<p d ). 
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Further, the effects of a spoken action depend on the comprehension of the action by 
the receptor (who may not understand or even perform an erroneous operation), and 
consequently, they are not always predictable. In other words, the effects of a spoken 
action only make sense when they are related with the intentions of the speaker. Thus, 
the evolution of a spoken action must be controlled during the dialog: from the mo- 
ment it is proposed until the time it is satisfied. For this reason we define the follow- 
ing states for an action: 

? Proposed goal; an action to be perform it. 

+ Reached goal; an accomplished action with no confirmation. 

++ Satisfied goal; a completed (and correct) action. 

@ Aborted goal. 



The dialog acts are expressed, by means of direct and spoken actions, as follows: 



Uf a, Mf a 
Uff a, Mff a 

Ufs </, Mfs (f) 
W* </>, , Mffs (j) 

where the action 
Ufs <j> = Uf share (f. 



the user or the machine performs a. 

the user asks the machine to perform the action a (or vice 
versa) 

the user informs (j) to the machine (or vice versa) 
the user asks the machine to inform (j) (or vice versa) 
fs is an abbreviation of the base action to share : 



2.3 Definition of the Language L d 

Definition 1. If 7 is the set of propositional symbols, Ab a finite set of base actions, U 
the symbol to named the user and M the machine, then the language L r/ is defined as 
follows: 

Lj is the smallest subset of T such that: 

- if tp, \p e L d then — . 0 , tp v \p e L d 
-if et e Ac, y tp e L d then 

Us tp, Ms tp, [Ufa\(p, [Mfa\cp, Ui (pe L d 

where Ac is the smallest subset of Ab, such that: 

- if a e Ab then a e Ac 

-if cpe L d then verify{(p) e Ac 

- if a e Ac and j3e Ac then cr,j3e Ac 

We use the abbreviated notations tp a \f> for — .(— \(p v — i ip) and tpz> ip for — . (cp a — tip). 
The true abbreviation is considered as a valid formula, e.g. cp v —<<p, and false as an 
abbreviation of — > true. 

Definition 2. The semantic of the language L d . The class M of the models of Kripke 
contains all the tuples M = <S, n, R u , R u , I,,, r u , r M > such that: 
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i) S is the set of possible worlds, or states. 

ii) n is a function that assigns truth values to the propositional symbols of T in a 
possible state s ( 7i(s) : 7 — > { 1,0} for all g S ). 

iii) Ru is a binary relation among the possible states of S. It is the relation of acces- 
sibility to the user knowledge ( Ru <z S x S) 

iv) R m i s a binary relation among the possible states of S. It is the relation of acces- 
sibility to the machine knowledge ( R M c S x S) 

v) l v is a binary relation among the possible states of .S'. It is the relation of acces- 
sibility to the user intentions ( I u <z S x S ) 

vi) t'u is a relation among the set of possible states caused by the accomplishment 
of the action a by the user in a possible state s, {r v : Ac x S — > fp (S)) 

vii) r M is a relation among the set of possible states caused by the accomplishment 
of the action a by the machine in a possible state s, ( r M '■ Ac x S —> p(S)) 

Definition 3. Let M = <S, n, R v , R u , I w r w r M > be a Kripke model of class M. The 
truth value of a proposition ( |= ) in a possible state s, based on the model M, is induc- 
tively defined as follows: 



M, s |= (p 


iff 7i{s){cp) — 1 for (p El T 




M, s |= — \(p 


iff M, s |a <p 




M, s |= cp v (p 


iff M, s |= (p or M, s |= (f 




M, s j= [Uf a ] (p 


iff W [ 5 ’ g ru(a, s)=> M, s’ 


1= (P\ 


M, s |= [ Mf a ] cp 


iff W [ 5 ’ g r M ( a, s)=> M, 


1= <P 


M, s |= Us (p 


iff \/s ’ [( 5 , s’) g Ru=> M, s’ 


= <p] 


M, s |= Ms cp 


iff Vs’ [( 5 , s’) g R M => M, s’ 


1= 


M, s |= Ui <p 


iff W [(s, s’) g I v => M, s’ | 


= <P ] 



where r v and r M , denoted as r A for their equivalence and shake of simplicity, are de- 
fined by: 

r A ( verify! <P), s) = {s} if M, s \= (p 
= 0 in other case 

r A { ( a ; /3), s) = r A (/3 , r A (a, s )) 



2.4 Definition of the Axioms 

This subsection presents the main axioms of our logic. The first part describes the 
axioms about the knowledge, the second part introduces some concepts related with 
the goal evolution, and the third part explains the axioms about the cooperative dia- 
log. 

2.4.1 Knowledge Characterization 

The following axioms describe the machine and the user knowledge, and for the case 
of the machine, they characterize it knowledge about the user intentions. 

Let A = {U,M}'- 

(Al) As cp a Asfcp-D As <j) 



axiom K 
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(A2) As (pz^ cp 

The user and the machine just know true facts 
(A3) As (pzi As As (p positive introspection 

The user and the machine know the facts that they know 
(A4) — i As <pzs As — i As (p negative introspection 

The user and the machine know the facts they do not know 

2.4.2 Goal Evolution 

In our model, the structure of the dialog is based on the user intention. This intention 
is expresses as an action (or plan), which its effect is the desired final state. Thus, the 
realization of a dialog act generally produces a movement to the goal. This movement 
depends on the current situation and in the dialog act. It is represented by three states: 
(i) a proposed goal, when the user orders the machine the execution of an action 
( Uff ), or asks it for an information ( Ujfs ); (ii) a reached goal, when the machine re- 
sponds ( Mfs ), or execute the requested action (M/); (iii) finally, a satisfied goal, when 
the user is pleased with the machine response. Of course, the user can abort a goal at 
any moment. The following paragraphs describe this evolution at detail. 



The User Asks the Machine to Perform an Action 

i) A proposed goal is the effect of a user request. It is expressed as a user intention 
(that the machine performs some action) integrated with the machine knowledge. 

[Uffa\(MsUi[Mfa]<p) = [Ujf a] ( ? [Mf a]cp ) 

Here the abbreviation ?Mf a designates the action Mf a as a proposed goal. 

ii) A reached goal emerges when the action requested by the user becomes true. 

Ms Ui [Mf a] tp a [Mf a] tp 
= ? [Mfa\cp a [Mf a\cp 
= + [Mf a] cp 

where the abbreviation +Mf a designates the action Mf a as a reached goal. 

iii) A satisfied goal materializes when the user admits the action of the machine as 
acceptable. This acceptation can be explicit (when the user informs that his inten- 
tion is no more related with the proposed action), or even implicit (when the user 
asks the machine to perform a different action - not related with the previous one). 

Ms Ui [Mf a\<p a ([ Ufs [/-,/ [Mf a] cp] <f v ( [ Uffjd] y a rel( (p,ff))) 

= ? [MfaUp a ([Ufs U^i [Mfa]<p]<pv([Uffj3]yA^rel(<p,/m 
= ++ [Mf a ] cp 

Here, the abbreviation ++Mf a designates the action Mf a as a satisfied goal. 

iv) An aborted goal occurs when the user informs the machine that his desire is no 
more the achievement of the action in progress. 

Ms Ui [Mf a] (p a ( Ufs Ui [M-f a] tp) 
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= 7 [Mf a] <p A ( Ufs Ui [M-f a] <p) 

= @ [Mfa\(p 

Here, the abbreviation @Mf a designates the action Mf a as an aborted goal. 

The User Asks the Machine to Inform Something 

In this case (a question Ujfs), the goal has the same evolution that in the previous one, 
but the machine response is of the type Mfs. It is necessary to remember that ffs is a 
short form for ff share, where to share is a base action. 

[Ujfs </)] (Ms Ui [ Mfs </)]) 

= ms 0 (? [Mfs m 

Here the abbreviation ?Mfs (f designates the sharing of information Mfs <f as a pro- 
posed goal. 

The Machine Asks the User to Inform Something 

In this framework for human computer interaction, the machine has no intentions. 
However, it can take some initiative when the information required to complete a task 
is incomplete. Basically, the machine can generate a subdialog in order to request 
some complementary information to the user. Different to a user request, the evolu- 
tion of a subgoal proposed by the machine has just the following two stages. 

i) A proposed subgoal is the effect of a machine request for complementary infor- 
mation to the user. 

[Mffs (j)\ (Ms Ui [Ufs <f\(fi) = [Mffs <j\ ( ? [Ufs (j\<p) 

Here the abbreviation ? Ufs (f designates the action Ufs (f as a proposed subgoal. 

ii) A reached, and consequently, satisfied subgoal materializes when the answer 
waited for the machine is true, 

Ms Ui [Ufs d>\a a [Ufs (f\(p 
= ? [Ufs /\ [Ufs 
= ++ [Ufs <p\(p 

Here the abbreviation ++Ufs (/> designates the action Ufs ^ as a satisfied subgoal. 

iii) An aborted subgoal occurs when the user informs the machine that he will not 
answer the given request. 

Ms Ui [Ufs d>\ tp a [Ufs [U—fs d>\ (p ] } 

= ? [Ufs (j)\(p a [Ufs [U—fs (j)\(p ]} 

= @[Ufs</>\cp 

Here, the abbreviation @ Ufs (j designates the action Ufs (f as an aborted goal. 

2.4.3 Cooperative Dialog 

As explained in the above sections, the cooperative dialog is produced around a task, 
where the machine is just a collaborator in it achievement. In our case, the machine 
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has the obligation to resolve the proposed goal, for that it will be necessary its spoken 
intervention when lack the information to complete the task. The following axioms 
describe the cooperative dialog under these considerations. 

The Machine Is Obligated to Reach the Proposed Goal 

If the user orders the realization of a task, and the machine knows the plan (Msf, 
knows how to do it), then the machine executes this plan 

(A5.i) ? [Mf a] (p a Msf a/\ ae AbtD [Mf a ] <p 

for a base action (i.e. elementary instruction) 

(A5.ii) ? [Mfifa- fa)]{ <Pi a fa ) a Msf fa => [Mf fa ] Op, a [Ufffa] ( ? [Mf fa] cp 2 )) 
for a complex task (i.e. a sequence of basic actions) 



The Ignorance of the Machine Generates a Question 

In this case, the machine knows the proposed task, but requires more information to 
accomplish it. Consequently, the task must be stopped until the information is com- 
pleted. The machine executes two complementary actions before the tasks: (1) it asks 
the user about the required information, and (2) it verifies the user answer. 

(A6) ? [Mf a]<p a Msf a a M— ,s parameterfa, fa z) 

? [Mjfs parameter(a, (j))] 

(? [Ufs parameter(a, fa] ; 
verify(++Ufs parameter(a, fa', [Mf a] (fa 



3 A Brief Example 

The following example shows the structure of a short dialog where the interchanges 
of information converge to a goal in a world of design. For this example, we take a 
small fragment of a dialog from the DIME corpus [19]. This corpus was constructed 
for studying the multimodal interaction in the domain of kitchen design. In the se- 
lected fragment the user asks the system to reallocate the kitchen sink. However, he 
does not specify the new position, causing a subdialog (see step 4) by the system in 
order to obtain such information. 

The knowledge of the system at this moment: 

(Cl) Msf moveiOh), NewLocation) The machine knows the base action to move 

utt259: u: now move the kitchen sink (ahora a recorrer el fregadero) 

[JJff movc(obj53, ■/)[ 

1. [Uff move(obj53, y)]( ? Mfmove( obj53, y)) 

Definition of Uff the proposed goal 

2. ? [Mjfs paraineter(move(ob)53, y), ^)] 

(? [Ufs parameter(move(ob]53 , y), y)]; 
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verify(++Ufs pcirameter(move( obj53, y), y)\ 

Mfmove( obj53, x)) 

1 , C 1 , A6 a generated subgoal 

3. [Mjfsparameter(move(ob)53, y), x)] 

(? [Ufs parameter(move(ob}53, y), x)]l 

verify(++Ufs parameter{move{6b)53, y), x); 

Mfmove{ obj53, x) ) 

2, A5.i 

[ Mffs parameter( mo veiobj 53, /), y)\ 

utt260: s: where do you want it? (^a donde quieres que lo ponga?) 

4. (? [Ufs parameter(move( obj53, x), x)]l 

verify{++ Ufs parameter(move(ob'\ 5 3 , x), y)\ 

Mf move{pb)53, y) ) 

Definition of Mffs proposed subgoal 

utt261: u : move it to the dishwasher (recorrerlo hacia la maquina lava trastes) 
[Ufs parameter(move( obj53, loc45), loc45)] 

5. [Ufs parameter{move{ob)53, loc45), loc45)] 

( Ms parameter(move{ob]53, loc45), loc45) 

Definition of Ufs 

6. ++ Ufs parameter(move{ob]53 , loc45), loc45) 

4, 5 satisfied subgoal 

7. [Mfmove( obj53, loc45)\(liold{location(ob}53, loc45))) 

3, 6, and the definition of move 

[Mfmove( obj53, loc45)] 

M: <reallocation of the kitchen sink to the new position in the graphical context> 

8. +[Mf move{ obj53, loc45)] 

1,7 reached goal 



4 Conclusions 

This paper establishes the basis of a multimodal dialog model. In this model, the user 
activities are characterized by his goals, which at the same time give a structure to the 
dialog. 

The proposed dialog model is based on a modal logic framework. This logic 
framework considers the action its central element. Thus, a spoken intervention is 
contemplated just as another form of action. This way, the dialog conduces the exe- 
cution of an action, and the action causes the dialog. 

In addition, our logic framework describes the dialog (i.e,, the interchange of in- 
formation) as the evolution of the goal proposed by the user. This evolution is a se- 
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quence of spoken and direct actions defined by the user intentions and the machine 
knowledge. 

As future work, we plan to define new axioms that allow describing other phe- 
nomenon of the multimodal conversations, such as the resolution of the incomprehen- 
sion related with the problems of communication. 
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Abstract. The aim of this paper is to present a model for the interpretation of 
imperative sentences in which reasoning agents play the role of speakers and 
hearers. A requirement is associated with both the person who makes and the 
person who receives the order, which prevents the hearer coming to 
inappropriate conclusions about the actions s/he has been commanded to do. By 
relating imperatives with the actions they prescribe, the dynamic aspect of 
imperatives is captured. Further, by using the idea of encapsulation, it is 
possible to distinguish what is demanded by an imperative from the inferential 
consequences of the imperative. These two ingredients provide agents with the 
tools to avoid inferential problems in interpretation. 



1 Introduction 

There is a move to produce formal theories which attempt to capture different aspects 
of agents, such as the ability to reason, plan, and interpret language. Some such 
theories seek to formalize power relations between agents, where an agent can make 
other agents satisfy his/her goals (e.g. [ 10; 1 1 ]). Here we present a model in which 
agents represent speakers and hearers. Once an agent has uttered an order, the main 
role of the agent addressed is to interpret it and decide what course of actions s/he 
needs to follow, so that the order given can be satisfied. Nevertheless, without care, 
such autonomous reasoning behavior might lead to inappropriate inferences, as we 
shall see. In the specific case of the interpretation of imperatives, there is an 
additional problem: imperatives do not denote truth values. The term practical 
inference has been used to refer inferential patterns involving imperatives. For 
instance, if an agent A is addressed with the order Love your neighbours as yourself! 
and A realizes that Alison, is one of those object referred as his/her neighbours, then 
A could infer Love Alison as yourself. Even though the order given cannot be true or 
false [9; 13; 18], 

Formalizations in which imperatives are translated into statements of classical 
logic are problematic as they can lead an agent to draw inappropriate conclusions. In 
those approaches, if an agent A is given the order Post the letter!, s/he can 
erroneously infer that s/he has been ordered to Post the letter or burn the letter! by 
using the rule of introduction for disjunction. Thus, having a choice, agent A might 
decide to burn the letter. In deontic approaches this is known as the Paradox of Free 



A. Gelbukh (Ed.): CICLing 2004, LNCS 2945, pp. 56-67, 2004. 
© Springer-Verlag Berlin Heidelberg 2004 




Agents Interpreting Imperative Sentences 57 



Choice Permission, which was thought to be an unsolved problem as recently as 1 999 
[17]. 

Here we present a model which does not suffer from this kind of paradoxical 
behaviour. It involves the following ingredients a) agents with the ability to interpret 
imperative sentences within b) a context. It also captures c) the dynamic aspect of 
imperatives, so that imperatives are not translated into truth-denoting statements. 
Finally, d) encapsulation makes agents capable of distinguishing what is uttered from 
what is not, so avoiding ‘putting words in the mouth of the speaker’ . 

The rest of the paper is organized as follows. First as a preamble to the model, the 
concepts of imperative, context and requirement are defined. Then a formalization is 
presented followed by examples illustrating that the model overcomes inferential 
problems in the interpretation of imperatives. The paper ends with some conclusions. 



2 Analysis 

In this section, we describe some of the main concepts which need to be addressed by 
the model in which agents interpret imperatives. As a first step we define imperative 
sentences as they are considered in this paper. 

Definition: Imperative 

Imperatives are sentences used to ask someone to do or not to do something 
and that do not denote truth-values. 

This definition introduces a distinction between different sentences used to ask 
someone to do something. Following the definition, Come here! might convey the 
same request than I would like you to come here. However the former does not denote 
a truth value, whereas the latter does it. The former provides an example of the kind 
of sentences that we shall address here. It is worth to mention that the ‘something’ 
which is requested in an imperative shall be called a requirement. Other examples of 
imperatives are: a) direct: Come here! ; b) negative: Don’t do that!; c) conjunctive: Sit 
down and listen carefully !; d) disjunctive: Shut up or get out of here!; e) conditional: 
If it is raining, close the window! 



2.1 Context 

It is widely accepted that the interpretation of utterances is context dependent. For 
instance the imperative Eat!, said by a mother to her son, might be an order. However 
said to a guest it might be only an invitation to start eating. The real meaning depend 
on context. 

Many authors, agree that context is related to people’s view or perception of the 
world or a particular situation rather than the world or the situation themselves [2; 
15]. That is, context is conceived in terms of what agents have in their minds. After 
all this is what an agent uses to interpret a sentence. This might include intentions, 
beliefs, knowledge etc. However we will subscribe to the following definition. 
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Definition: Context 

A context is a consistent collection of propositions that reflects a relevant 
subset of agents’ beliefs. 

This view will not commit us here to an ontology or classification of components 
or to the use of operators such as B for beliefs and K for knowledge (Turner [16]). 
We simply assume that all that which constitutes a context can be represented in 
terms of propositions so the context is viewed as a consistent set of propositions [3], 



2.2 Dynamic Aspect of Imperatives 

Different authors have related imperatives and actions (Ross [13], von Wright [17], 
Hamblin [6] p. 45 and Segerberg [14] among others). Sometimes it is said that 
imperatives prescribe actions. Nevertheless, it would be more precise to say that 
imperatives posses a dynamic aspect. For instance, I would like you to open the door, 
and Open the door! might convey the same request. However the former is a 
statement which defines a truth value. It can be true or false within a state of affairs, 
but there is not a dynamic aspect in it. However the latter, does not denote a truth 
value, but if we assume that is uttered in a state of affairs in which the door is closed, 
it demands another future and wished state of affairs in which the door is open. That 
is, it demands a change of states, it involves a dynamic aspect (Fig. 1). This suggests 
that translating imperatives into statements is the wrong approach; it does not model a 
basic aspect of imperatives. 



P Q 

Open the door! 

► 

Si~ initial si ale P-pre-condilions - door cloned 
Sf — final state (/-post-conditions - door open 





Fig. 1 . Dynamic aspect of imperatives 



2.3 Evaluation of Imperatives and Correctness 

When an agent interprets an imperative, s/he also evaluates it. For instance in the 
example above. Open the door! would not make sense in a state of affairs where the 
door is already open. It seems that imperatives impose some pre-conditions that the 
agent verifies during the process of interpretation; the door must be closed. 
Complying with an imperative will produce a result, a post-condition which shall 
indicate that the order has been satisfied; the door will be open. Thus, the dynamic 
aspect of imperatives provides us with at least three components, namely pre- 
conditions, imperative, and post-conditions. This resembles what is known as Hoare's 
triple [8]. In 1969 Hoare proposed a logic to verify correctness of programs. He 
proposed to evaluate triples ,P{S}<2, where S is a program, P are its pre-conditions, 
and Q are its post-conditions. According to Hoare, the program S is correct iff the 
assertion P is true before initiation of S, and then the assertion Q is true on its 
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completion. Since the interpretation of imperatives can be construed as involving a 
verification process, here we adopt the concept of correctness of an imperative which 
is defined analogously by using Hoare’s triple P{Imp}Q. 

Definition: Correctness of an Imperative 

The imperative Imp is correct with respect to a state of affairs Si iff P is 
the case in S t and Q is the case in the state S, reached after the imperative 
is satisfied. 

An imperative is satisfied when the agent addressed, complies with the imperative, 
reaching the state wished by the speaker. 



2.4 Encapsulation 

A program is a sequence of instructions encapsulated in a file. The file contains the 
instructions that a programmer wants a computer to perform. Hoare logic would allow 
us to verify the correctness of such program, and make derivations but it will not 
derive a new program. That is, it is not assumed that the programmer wants the 
computer to perform any derivation during the verification of the correctness of a 
program. We shall use this idea to distinguish what an agent is commanded to do, so 
that, logical derivations will not be considered new imperatives. In fact this also 
corresponds to the use of imperatives. If an agent is given the order Close all the 
windows! while being in a house, and s/he realizes that the kitchen's window is open, 
then the agent might conclude that s/he should close that windows, as a derivation of 
the order given. However the agent will not assume that his/her inferential derivation 
Close the kitchen 's window, means that is an imperative uttered by the speaker. 

Now we present the model, illustrating how it is able to describe the main features 
of imperatives and how it overcomes the paradoxical behavior faced by other 
approaches. 



3 Model 

L ImpA is a dynamic language, defined along the lines of first-order dynamic logic as in 
Harel [7], In this language Hoare 's triples can be represented and, therefore, so can 
the concept of requirement. The ability of an agent (the actions that an agent is able to 
perform) can also be represented. Its interpretation will allow us to verify validity 
with respect to a context. 



3.1 Definition of Sets 

We define the following sets. C={c, c v c 2 ,...} is a set of constant symbols. 
Analogously we define set for variable symbols (V); function symbols (F); regular 
constant symbols (C); speaker constant symbols (CIS); speaker variable symbols (.S'); 
hearer constant symbols (C/f); hearer variable symbols (//); atomic actions ( AtAct)\ 
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atomic predicate symbols ( AtPred ) ; and we assume that AC= C u CS u CH and A V= 
VkjSkjH. 



3.2 Definition of Terms 

Terms are defined recursively by: t ::= clcsIc/flvIsl/zl/U,, t 2 , ..., tj. Thus, a term is a 
regular constant (c), a speaker constant ( cs ), a hearer constant (ch), a regular variable 
(v), a speaker variable (,?), a hearer variable (/;) or a function (/(?,, t 2 , ..., tj) of arity n 
(n arguments), where t v t 2 , ..., t a are terms. The expressions ts ::= cs|x and th ::= ch\h 
define the terms for speaker and hearers respectively as constants or variables. 



3.3 Definition of wff of the Language L ImpA 

The set FOR contains all possible wffs in L ImpA and the set Act contains all possible 
actions defined in the category of actions. The definition of the language L ImpA is given 
by cf» ::= p(t p t 2 , ..., t n )|t l =f 2 H < t> l^i A ^ |3x<]) |[a](|). In other words, if pe AtPred, t v t 2 , 
. .., t n are terms, xe V , and a eAct, then p{t v t 2 , ..., tj is an atomic predicate, with arity 
n. t=t 2 is the equality test (=). — 4 is the negation of (|). (t),A(|) 2 is the conjunction of <\> 
and \| f. is the existential quantifier. [a](f> is a modal expression indicating that (|) 

holds after the action a is performed. The usual abbreviations are assumed: (^vc)), = 
— 1 ( — (J) ] a — (),), (|) 1 — ^(J), = — (J) |V (|) . , (J) | * — >(J) , = (|) — ^(J) , a (|) , — ^(!) , Vx(J) = — i3x — i(J) and < ( / a 0 = 
— i [CX] — iCj). 



3.4 Category of Actions 

The set Act of actions is defined as follows: a ::= a{t v t„ ..., t n )|(|)?|a 1 ;a 2 |a 1 + 
a 2 |(a) ts t h |(a), h . In other words, if a, a,, a, e Act , t v t 2 , ..., t n are terms and ts, th are 
terms for speaker and hearer respectively then a(t v t 2 , ..., t n ) is the atomic action. 
is the sequential composition of actions. aj+a 2 is the disjunction of actions. cf>? is a 
test and it just verifies whether <\> holds or not. (a) tsth is a requirement, an action 
requested directly or derived from a requested one by a speaker ts to a hearer th. (a) th 
is an action that a hearer th is able to do. In this way, we keep track of the agents 
involved in a requirement, both uttered or derived. 



3.5 Representation of Requirements 

Requirements are represented in terms of the actions prescribed, with explicit 
reference to the speaker who demands, and the hearer who is being addressed. 

Because of the dynamic aspect of imperatives, they are associated with the actions 
they prescribe, therefore the dynamic operators must be used between them. Thus, the 
sequencing operator (;) models a conjunction of requirements the choice operator (+) 
models a disjunction of requirements, and a conditional requirement is represented by 

using the symbol where (c|)^a)=(c|)?;a). Following Harel [7] and Gries [5] a 
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Hoare's triple P{a}Q can be represented in L ImpA as P— >[a]Q. Thus, P— >[(fl) K J<2 is an 
atomic requirement, P— >[(aj;a,) ts JQ is a conjunction of requirements, 

P^[(aj+a 2 ) ts J2 is a disjunction of requirements, and P— >[((|)?;a) tslh ]<2 is a 

conditional requirement. 



3.6 Axioms 

AO) T (any tautology); Al) [c|)?;a]\|/ <-► (|>-A[a]\|/; A2) [(<])?;a) ts J\|/ <-> c|)-A[(a) ts J\|/; 
A3) [(<])?;a)J\)/ <|> — >[(a) J\)/; A4) [a^a,]^ <-► [aj([a 2 ])<>; A5) [oq+ogtl) ^ 

[a,]c|)A[a 2 ](|); A6) [tf)?]\)/ <-»• c|)->\|/; A7) [a](cf) — >\|/) [a]c|)-^[a]\|/; A8) Vx^Ot) -h> cf>(r) 

provided that t is free in 4>(jc); A9) V.r((|)— >\|/) — » (|)H>V.r\|/ provided that x is not free in 
c|). Furthermore we can relate requirements and ability of hearers: shAl) [(0C) ts J\|/ — > 
[(a)J\|/; hA2) [(a)J\|/ -> [a]\|/. 

Axioms, from A0)-A7) are standard in Dynamic Logic, A2) and A3) explicitly 
include speakers and hearers and A8)-A9) are standard in predicate logic respectively. 
shAl) is analogous to Chellas (1971: p. 125) axiom where ‘ought’ implies ‘can’ in his 
model of imperatives through Obligation and Permission. Here shAl) expresses that 
if a is demanded for ts to th, is correct, that implies that there is some action, usually 
a sequence <x=a 1 ;a 2 ; ... ;a n of actions, such that hearer is able to perform it, so that a 
can be satisfied. hA2) emphasise that any action a hearer is able to do is simply an 
action in the nature. 



3.7 Inference Rules 

a) Modus Ponens (MP): If cf» and (f) — >cp then cp; b) Necessitation rule (Nec): If (]) then 
[a](f> y c) Universal generalization (UG): If (f» then Vx (f» provided x is not free in <|). 



3.8 Semantics 

The semantics for L lmpA is given as a possible worlds semantics. Formally, a model m 
is defined to be the structure <W, D, Val, A, 8, lj, w, t, k>, where W = { w ri , w r ... w n , 
. . . } is a set of worlds or states. D is a non empty set called domain composed by a) Sp 
a set of agents playing the role of speakers, b) Hr a set of agents playing the role of 
hearers and c) a set D’ of objects such that D = D' l jSp'jHr. Val is a function 
assigning a semantic value to each non-logical constant of L ImpA , where such constants 
correspond to standard constants, functions, predicates and actions. A: AtActx 
D" => 2 WXW , defines a set of pairs (vv, w’) describing actions a (cf) such that starting in 
w the occurrence of the action would lead to the state w’, where cf = ( d p d 2 , ... d r ) and 
d<ED. (/= 1 , n ). 8: AtActxD"xSpxHr=>2 w xw , defines a set of pairs of states (w, w’) 
describing requirements such that in the state w a speaker is demanding some request 
to some hearer who is able to get the state w’ where the request is considered 
satisfied. T|: AtActxD"xHr => 2 WXW , defines a set of pairs of states (w, w') describing 
actions such that starting in vv an agent (hearer) would be able to perform the action 
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and so reach the state w’. t/is a valuation that assigns a semantic value (true ox false) 
to predicates and formulae, x is the environment function that assigns to each variable 
x and world w an element d from I), x is defined in the next Section below. K provides 
the valuation for terms by using the environment function x and Val. 



3.9 Semantics for Non-logical Constants 

Val assigns values to the different kind of constants as follows. If c is a standard 
constant then Val(c) = c vd = d where de D. If cs is a standard speaker constant then 
Val{cs) = cs Val = os where ose Sp. If ch is a standard hearer constant then Val(ch) = 
ch Val = oh where ohe Hr. If p is a predicate constant then Val(p) = p vd where p Val <z D". 
If a is a n-ary action constant then Val(a") = a Val where a Val ctWxWxD" . If / is a 
function from D" to D , Val(f) =f Val where f Val c D"* 1 . 

3.10 Semantics for Terms 

a) Environment Function 

Let x be the semantic function for variables defined as follows, x: VxW=>D, such 
that x(x/d, w) is exactly like x, except that x(x/d, w) assign d to x in w. 

b) Semantic for Terms 

- If x is a variable symbol then tc(x) Zw = x(x, w ) = d such that de D 

- If c is a constant symbol then k(c) Xw = Val(c) = d such that de D 

- If s is a speaker variable symbol then k(.v) Xhi = x(j, w) = os such that ose Sp 

- If h is a hearer variable symbol then K (h) Xw = x(h, w) = oh such that ohe Hr 

- If cs is a speaker constant symbol then k(cv) Tiv = Val(cs) = os such that ose Sp 

- If ch is a hearer constant symbol then K {ch) Xw = Val{ch ) = oh such that ohe Hr 

- If/If,, t 2 , tj is a function symbol from D" to D. then K (f(t v t 2 , tJ) Xw = 

ya;(/)(K(fj) v , k (t 2 \ w , ..., k (tj x j 



3.11 Semantics for Actions 

If ts is a speaker term, th is a hearer term and a, a,, and a 2 are actions with no 
references to speakers and hearers, we define the following semantic functions 8, r|. 
In order to avoid subscripts of subscripts we use the following notation. 

§((«) ts .Jv = 8(a - th K and h((a) th ) x> , = il(a, th) Xw . 

a) Semantics for Requirements (Triples Involving Speaker-Action-Hearer) 

- Atomic requirements: 

If a{t v t 2 , ..., tJeAtAct then 8 ((a(t v t 2 , ..., t n )) K .Jv = •••> rh K = 

{(w„, w) | ( w h , w)e Val (a) (k (tf Xw , K (t 2 \ w , ..., K (tJ x J, K (ts\ w , K (th\ w )} 

- Composition of requirements: 

8((a,;a 2 ) tsth ) Xw = 8((a,;a 2 ), ts, th\ w = SKog, ts, th\°h((a 2 ), ts, th\ w = 
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{ (w h , w 2 )|There exists w/(w h , w t )e 8(a„ K(ts) Xw , k (th) x J Xw and 
(w,, w.)e S(a 2 , K(r.s) Tlv , K{th\J x J 

- Disjunction of requirements: 

8((aj+a 2 ) ts J Xw = 8(a,+a 2 , ts, th\ w = 8(a,, ts, th\ w u 8(a 2 , ts , th) Xw = 

{(w,. w ) j (w,, w 2 )g 8(a,. k {ts\ w , k (th) x J Xw or (w„ w)e 8(a 2 , K(ts) v , k (th) x J x J 

The semantic functions, (T|) for ability of agents and (A) for actions, are defined 
analogously, with the corresponding number of arguments. 

b) Mixing Requirements ((a) tsth ). Ability ((a) th ) and Actions (a) 

If a, a ; and a, are actions with no references to speakers and hearers then we define 
the semantic function T as follows: a) T(a) tit , = A(a) Xw , b) r((a) th ) Xu , = ri((a) th ) Xw and c) 
n(a\Jr w = 8((a), sth ) Tw . Now for any actions a, oq and a, in Act , even involving 
reference to speakers and hearers, T is defined as follows, i) r(a 1 ;a,) tw = 
r(aX°r(og v = {(w,w’)|3w” such that (w,w”)6r(a,), w and (w”,w’)eT(a 2 ) x ii) 
r(a,+a 2 ) v = r(a,) v u r(a 2 ) Xir = {(w, w’)\ (w, wjEna,),, or (w, vtjer(a 2 ), J 

3.12 Semantics Expressions in the Language L IrapA 

- lAp(t v t 2 , ..., tJ) Xw = true iff <K(r 1 ) T>w , K (t 2 \ w , ..., K (t n ) x >e Val(p). 

V defines the set of states where the predicate p{t v t 2 , ..., tj is true and Val(p)^D n . 

- ^(t, = t 2 \ w = true iff K(t,) Tw = K (t 2 ) Xw 

- V(—i§) Xw = true iff (4 >) Tu = false 

~ !'(<!>, A 4> 2 ) Tiii = true iff U' = true and V(§,) Xw = true 

- v(3x§) Xw = true iff there exists an element cl in D such that l / (<|))^ w)w = true 

tix/dpv) is exactly like rexcept that t(x/d,w) assigns d to x. 

- U<[a](|)) Tlv = true iff For every w’ (if (w, w’)e r(a) Xlv then mC 4>) x , v . = true) 

- Mf<a>(f)) Tli , = true iff There exists wj ((w, wjer(a) tii and V(§) Xw . = true) 

3.13 Soundness 

Soundness of L ImpA follows from the soundness of first order dynamic logic as defined 
by Harel [7], 

Notation: Truth at state w of an arbitrary formula (|) under L ]mpA for any valuation T 
and the model m is inductively defined using the notation we U(|)) rii . and simply 

abbreviated as wt=4>. When <\> is not true at w under L ImpA we can write If <\> is 

valid in m we write simply l=(|). 
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3.14 Truth with Respect to a Context 

If we have a description of a context, that is, a collection of consistent propositions, 
we can define the truth of a proposition (<])) with respect to context as follows. Let 
/.'={())„, <(),, ..., <f) n } represent our context, where for i=l,n, cf> &FOR. We may also 
identify the set of states defined by our context as follows. v'(k) = {vtj For every §ek, 
vvF(|) [ . Now we can define truth with respect to a context. If k represents a context and 
(|)G FOR, we use the symbol ‘t=’ and the notation kt=/$ to indicate that cf» is true in the 
model m with respect to context k , for any assignment T. We abbreviate k\=/$ simply 
as kt=§. When cf) is not true at k under L ImpA we can write ktkfy. Thus, if §eFOR, k\= cf) 
iff for every w if we v’ (k) then wl=<t>. In this model we assume that in expressions 
involving more than one agent, context represents a common set of beliefs shared by 
the agents involved. 



3.15 Hoare Style Rules 

The following are derived rules, which operate between actions, which might or 
might not make reference to speakers and hearers. Pre usually indicates pre- 
conditions and Pos post-conditions. We assume the equality Pre{a}Pos=Pre— >[a]. 
The derived rules, (I;) Introduction for composition, (1+) Introduction for disjunction 
and (1^) Introduction for conditional are given below. 

(I;) If b Pre — > [ cz , \Pos’ and ^ Pos’~>\a 1 ]l , os then t l , re~s\a. ] ;a 2 \Pos. 

(1+) If tPre—>[o. t ]Pos and \-Pre^[a 2 ]Pos then t Pre— >[ 0 ^+ 0 ., \ Pos. 

(1^) If \-{Pre/\§)— >[a]Pos and v{Pre/\—§)^>Pos then \-Pre— »[<|)?;oi].Po,y. 



3.16 Correctness for Imperatives 

Having all this infrastructure to represent and verify requirement, we can formalize 
the definition of correctness for imperatives. 

Definition: Correctness of an Imperative 

Given a requirement (a) sh , prescribed by an imperative utterance Imp(k, P, 
(a) sh , Q ) we say that Imp is correct w.r.t context k iff k\=P— >[(a) sh ]<2 for 
appropriate pre and post-conditions P and Q. 

Note that this definition of correctness is only a case of the more general definition 

k\=P— >[a]<2. which defines the correctness of any action in L ImpA . This includes 
requirements, ability of agents and actions in general. 
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3.17 Encapsulating Uttered Requirements 

In order for an agent to distinguish what is uttered from what is not, we encapsulate as 
follows. 

Definition: Set of Requirements 

Let be a k = eta, ) sh , (a 2 ) sh , (a n ) sh > a set of requirements demanded in 
context k, such that a,, a,, a n represent actions prescribed by 
imperatives sentences. 5 and li represents the agents playing the role of 
speakers and hearer respectively. 

Note that o t allows the distinction between demanded and derived actions. On the 
other hand, there is the implicit assumption that all requirements in a k are supposed to 
be satisfied as long as a k is correct. 

Definition: Correctness of a Set of Requirements 

A set a k is correct with respect to context k iff kt=P{(a 1 ) a ;(a 2 ) sh ; ...;(a n ) sh }(2 
for appropriate pre and post-conditions P and Q. 



4 Model at Work 

In the example below we assume that k the context, represent not the set of beliefs of 
a particular agent, but rather a subset, such that it represents a set of common beliefs 
of the hearer and speaker involved. 

а) Uttered and Derived Requirements 

Let us assume that Helen says to Betty, Love your neighbour as yourself! Betty 
should be able to encapsulate the requirement such that o ( =<{Love your neighbour as 
yourselfl) Hdcn B >. We can paraphrase the order as a conditional requirement, where 
a(x) = Love x as yourself, <|>(x) = x is your neighbour, Q(x) = You love x as yourself 
and P(x)=—iQ(x). Thus, the Hoare’s triple of the imperative is \/xP(x)—> 
[(c|)(.r)^a(x)) Helen Betty ]g(x). If we assume that the requirement is correct w.r.t. k, then 

kt=\/xP(x)—> [(c|)(.r)^a(.r)) Helen Be „ y ]»2(x). This means that for both Helen and Betty, the 
requirement according to their beliefs is acceptable. If furthermore it is the case that 
(^(Alison) = Alison is your neighbour, then we can derive as follows. 

1) kt=VxP(x)^> [(<Kx)^a(x)) Helen , BMy] Q(x) assumption 

2) k\=Vx (P(x)A§(xy)-> [(a(x)) Hden Be JQ(x) 1 ), axiom Al) 

3) k\ -(^(Alison) assumption 

4) k\=(P(Alison)/\<!sf(Alison ))— » [(a (Alison)) Hclctl Bmy \Q(Alison ) 2), Univ. Inst. 

5) h= P(AIison)A§(Alison) 3), 4), Int Conj. 

б) k\= [(a(Alison)) Helm Bclly ]Q(Alison) 4), 5), MP 
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In 6) Betty would derive the requirement of loving Alison, from the original 
request by Helen, given that she is one of her neighbors. However that is not an 
uttered requirement, (o.(Alison)) Helen Kmy i a t . 

b) No Choice 

Let us assume that now Helen says to Betty, Talk to the president! Betty would 
distinguish this uttered requirement as follows a k =<( Talk to the president ) Helen B >. 
We can paraphrase the order, such that a = Talk to the president, Q = You have talked 
to the president and P=—iQ. Thus, the Hoare’s triple of the imperative is P—> [(0c) Hden 
B ]Q. If we assume that the requirement is correct w.r.t. k, then k\=P^> [(a) Hele „ B ]Q. 
This means that for both Helen and Betty, it is acceptable the requirement of talking 
to the president, according to their beliefs. 

If we assume that (3 = Kill the president, Betty and Helen cannot introduce a 
disjunction such that Betty believes that a choice has been uttered and given to her, 
That is o t =<(a) Hden Betty + (p) Hclcn Betty >. 0n the other hand, even a verification of a 

choice might be incorrect, that is kfrP^> [(a) Hden Bdty + ('P) Hel e„, B.J2- There mi ght be a 
clash between this verification and Betty’s beliefs. 

c) Impossible Requirements 

Let us assume that now Helen says to Betty, Have three arms! Betty would 
distinguish this uttered requirement as follows a k =<{Have three arms) Helm >. We 
can paraphrase the order, such that a = Have three arms, Q = You have three and 
P=—iQ. Thus, the Hoare’s triple of the imperative is P— » [(o0 Hden B ]Q. In this case, 
and under normal circumstances, there would be a clash between this verification and 
Betty’s beliefs. In this case Betty’s clash can be represented by the following 
expression, k\pP— ><(0c) Hden Betty ><2, which means that there is not a state she can reach 
by doing something so that she can have three arms. In terms of ability we can 
express this as k\TP— ><(a) BMty ><2, which means that Betty does not believe that she is 
able to perform the action of having three arms. 



5 Conclusions and Future Work 

We have presented a model in which agents that possess a reasoning ability are able 
to interpret imperative sentences. This does not suffer from the inferential problems 
faced by other approaches to the interpretation of imperatives. 

It is assumed that by various means (order, advice, request, etc.) imperatives 
convey requirements. The dynamic aspect of imperatives allows us to envisage that 
the connectives between imperatives behave similarly but not identically to classical 
logic connectives. A set of dynamic operators is used instead (disjunction (+), 
composition (;), conditional imperative (^)). Following Hoare, an introduction rule is 
provided for each of these operators. 

The features of the model presented here, are that it captures the main aspects of 
imperatives (including the lack of truth-values), and that it corresponds to our 
intuitions about behavior of imperative sentences. 
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The model presented here is useful for verifying imperatives or sequences of 
imperatives, but it is not able to infer new utterances. This distinction between 
derived and uttered requirements allows us to avoid certain paradoxes. 

Propositions and imperatives interact within the model. It allows us to verify the 
appropriate use of imperatives (correctness). Verification of correctness provides a 
legitimation procedure , and it is able to detect impossible requirements. 

There are many possible extension for this model, for instance the explicit 
inclusion of time. The introduction of “contrary to duty” imperatives (Prakken and 
Sergot [12]; Alarcon-Cabrera [1]), would be another example. 

In a near future we want to implement this model in a computer system so that it 
can be used in natural language interfaces. At the moment we are working on the 
syntactic analysis of imperatives. 
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Abstract. This paper proposes a dynamically-changed knowledge system. Each 
belief in the system has a component reflecting the strength of support from 
other people. The system is capable of adapting to a contextual situation by 
means of the continuous revision of belief strengths through interaction with 
others. As a paradigmatic application of the proposed socially-supported belief 
system, a parser was designed and implemented in CLOS. The parser outputs 
(a) speaker intention, (b) conveyed meaning, and (c) hearer's emotion. 



1 Introduction 

A language user’s belief structure is one of the important cues for ‘proper’ utterance 
interpretation. In this paper, we defined the belief structure as a knowledge structure 
with a degree of subjective confidence. To understand an utterance ‘properly’ means 
to infer ‘as exactly as possible’ what the speaker intends to. Of course, contextual 
information is also important to proper utterance interpretation. However, we often 
cannot properly interpret utterances even when provided with considerable contextual 
information. Reference to the belief structure enables an utterance’s meaning to be 
properly interpreted with little or no reference to the situation. For example, when the 
speaker utters “This dish tastes good” there are possible interpretations for the hearer, 
such as “Father thinks mother has cooked a good meal and praises her, ” or “A man 
thinks the restaurant has served a good meal and requests one more. ” To retrieve the 
speaker’s intention, the first interpretation is automatically decided in the speaker’s 
mind by referring to his/her belief. In analyzing such phenomena we focus on the role 
of each language user’s belief structure and propose that the belief structure is built 
up by reflecting the other people’s beliefs through social interactions. 



2 How to Treat the Proper Context 

Although contextual information is considered as essential for utterance interpretation 
in traditional pragmatics, the problem how the hearer infers the appropriate context of 
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the utterance is not addressed. In Gricean theory (e.g. [2]), context is dealt with only 
as the obvious and given information. Computational pragmatics decomposes the 
problem into the steps involved in plan-recognition (e.g. [1]), therefore context is 
defined in terms of data structures. Relevance Theory ([5]) takes a more satisfying 
approach; asserting that assumption is the major cue for utterance interpretation. They 
state that assumptions are built on the basis of cognition for optimal relevance. This 
approach is similar to ours, but it downplays the fact that the cognitive computation 
for building the assumption has not been described yet in detail. This is a point we 
address. 

We also introduce the new idea of meaning being supported collectively with the 
help of other language users in the community ([3]). In our approach, context is not 
merely a data structure or given information, but rather a support mechanism existing 
between the cognitive systems of the members of a particular linguistic community. 



3 The Socially- Supported Belief System 

The Socially-supported Belief System (SBS) is a computational model of utterance 
interpretation which incorporates the dynamical revision of the belief system of a 
given language user ([4]). The belief revising system models the user’s linguistic and 
world knowledge with a component representing the degree of support from others in 
the community. A key concept in this system is the socially-supported belief (sb). 

As a hearer model, the task for SBS is to disambiguate the intention of the speaker 
by using its sb database. The hearer builds his/her own belief structure in the form of 
sbs. Each sb has a value, representing the strength of support from others. The belief 
which has the highest level of support is considered as the most likely interpretation 
of a message, with the ranking order of a knowledge-belief, revised dynamically 
through interaction. Figure 1 depicts the general architecture of the SBS model (Fig- 
ure 1). Ordinary models of utterance interpretation do not include the emotional re- 
sponses of the hearer, although many utterances elicit emotional reactions. We be- 
lieve that emotional reactions are an inherent function of utterance exchange and the 
process is best captured as a socially-supported belief processing. In our system, 
emotions are evoked by sbs activated by input words. When the SSB determines the 
final utterance function as a speaker’s intention, it extracts the associated emotions 
from the utterance function. From the utterance function “ praise ”, the SBP (Socially- 
supported Belief Parser) searches the emotion knowledge for the word “ praise , ” it 
extracts the emotion ‘'happy. ” 

The SBS has its roots in Web searching research. In Web searching situation, little 
or no information concerning the context of the search or concerning the searcher is 
available, yet a searching engine is expected to function as if it knew all the contex- 
tual information surrounding the target words. The main similarity between the SBS 
and search engines is in the weighed ordering of the utterance’s meaning built by the 
system and the presumed ordering of found URLs by the search engines. Support 
from others (other sites, in the case of Web searching) is the key idea in both cases 

([4]). 
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Fig. 1 . The general architecture of Socially-supported Belief System: the bottom of the right 
part of the figure represents “the hearer’s community (for instance, community of language, 
gender, or generation, etc.). The right upper part shows the belief structure of the hearer. The 
hearer communicates with the member of the community, and he/she takes its belief into 
his/her belief structure. Each belief has weight and the hearer’s belief structure is revised at 
each communication. The left part of the figure represents the utterance interpreter (whose 
work is explained in detail in 6.) 



4 Metaphor Comprehension 

We will discuss the analysis about the description concerning the comprehension for 
the metaphor “Life is a journey (in Japanese).” We deal with 302 propositions for 15 
descriptions. The participants, who are college students, ware instructed to describe 
interpretations, thought, emotions, and associations in reading the metaphor “Life is a 
journey.” After extracting propositions from the descriptions, we categorized them 
depending on their content. According to the analysis ( [3]), there are some agree- 
ments in participants’ cognitive processes. In the data, we can find some utterances 
which show that the reader refers to his/her belief structure. Some readers describe 
having seen the metaphor in Japanese textbook, and they tend to consider the meta- 
phor as a ‘lesson’ or a ‘moral’ ([3]). Considering that there is no contextual informa- 
tion about the metaphor in this experiment, this description suggests that the reader 
interprets the metaphor according to his/her own belief structure. 



5 Implementation 

Our intention inference parser has been implemented in Common Lisp Object System 
(CLOS). We designed the SBS as a word-based parser, which can deal with various 
types of data structures. This system has its own belief database, and revises the belief 
database dynamically based on the input beliefs. The SBS processes a sequence of 
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input words by using its word knowledge which consists of three types of knowledge: 
(a) grammatical knowledge (controls the current word’s syntactic behavior), (b) se- 
mantic knowledge (deals with the maintenance of the sb database derived from the 
current word), (c) discourse knowledge (accommodates information about speakers 
and utterance functions). Each knowledge type is represented as a daemon, a unit of 
program code executable in a constraint-satisfaction mechanism. The SBS parses 
each input word using the sb, which connects to the word. It can retrieve the intention 
from even the incomplete utterances, because the proposed parser is strictly a word- 
based parser. If any interaction occurs with other language users, the weight of each 
sb would be revised constantly, and the hearer can revise his/her own belief system, 
and check it for utterance interpretation. 

In result, the SBS determines (i) word meaning, (ii) utterance meaning, (iii) utter- 
ance function, (iv) utterance function, (v) the speaker, and (vi) the hearer’s emotion. 
Without any contextual information, it can do them all because it construct context by 
socially-supported beliefs in its system. Because of the reference to other’s beliefs, in 
addition, it can always gain a stable state of meaning within the particular community 
which the hearer belongs to. 
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Abstract. Domain shift is a challenging issue in dialogue management. This 
paper shows how to extract domain knowledge for dialogue model adaptation. 
The basic semantic concepts are derived from domain corpus by iterative token 
combination and contextual clustering. Speech act is identified by using 
semantic clues within an utterance. Frame states summarize current dialogue 
condition and state transition captures the mental agreement between users and 
system. Both Bayesian and machine learning approaches are experimented in 
identification of speech act and prediction of next state. To test the feasibility of 
this model adaptation approach, four corpora from domains of hospital 
registration service, telephone inquiring service, railway information service 
and air traveling information service are adopted. The experimental results 
demonstrate good portability in different domains. 



1 Introduction 

Dialogue management provides a rich human-computer interaction which allows 
users to convey more complex information than a single utterance. Despite of the 
recent significant progress in the areas of human language processing, building 
successful dialogue systems still requires large amounts of development time and 
human expertise [1]. The major challenging issue is the introduction of the new 
domain knowledge to the dialogue model when domain is shifted. That usually takes 
time to handcraft the domain knowledge that a dialogue manager needs. In the past, 
some papers [2,6] dealt with acquisition and clustering of grammatical fragments for 
natural language understanding; and some papers [4,9] employed statistical 
techniques for recognizing speech intentions. This paper emphasizes on how to 
extract crucial domain knowledge, including semantic concept extraction, speech act 
identification and formulation of dialogue state transition. Four corpora from different 
domains are employed to test the feasibility. 
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2 Corpora of Different Domains 

Two dialogue corpora and two single query corpora were studied. They belong to 
domains of hospital registration service, telephone inquiring service, railway 
information service and air traveling information service. Tables 1 and 2 summarize 
the statistics of the materials. The dialogues in NTUH corpus, which were transcribed 
from face to face conversation in Chinese, deal with tasks in a registration counter of 
NTU hospital, including registration, cancellation, information seeking, etc. The 
Chinese CHT corpus was transcribed from Chun-Hwa Telecom phone number 
inquiring system through telephone. Compared with NTUH corpus, most of the 
utterances in CHT corpus are very short and incomplete due to the fact that people 
often address the targets directly when using phone number inquiring service. 



Table 1. Dialogue Corpora 



Corpus name 


NTU Hospital (NTUH) 


Chun-Hwa Telecom 
(CHT) 




Register (make 


Inquire phone number or 


Content 


appointment), cancel, 
inquire info 


other info 


Number of dialogues 


13 


98 


Number of utterances 


440 


1923 


Average length 


33.5 


16.4 



Besides the dialogue corpora, two Chinese query corpora, Taiwan Railway 
Corpus (TWR) and Air Traveling Information Service corpus (CATIS), which include 
queries about train timetable and air traveling information, respectively, were 
employed. CATIS is a Chinese version of ATIS [7], All the airline booking 
information, e.g., location names, airline names, etc., are translated into Chinese. 
CATIS is much larger, and it contains more unknown words than TWR corpus. 



Table 2. Corpora of Single Utterances 



Corpus name 


Taiwan Railway (TWR) 


Air Traveling (CATIS) 


Content 


Train timetable queries 


Airline booking queries 


Number of utterances 


200 


5517 


Average length 


29 


32 



3 Acquisition of Semantic Concepts 

Semantic concept refers to key entities in a domain that users have to fill, answer or 
mention to accomplish the desired tasks. Concepts come with different forms, e.g., 
they could be database attributes, certain key verbs, or some types of named entities. 
A concept may have several values, e.g., a destination station in railway corpus may 
be any location names. In the proposed data-driven methodology, token combination 
is performed first to combine tokens, and then contextual cluster is employed to 
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gather terms with similar context. The combined tokens are labeled to create a 
modified corpus for another iteration of token combination and contextual clustering. 

NTUH dialogue corpus is adopted for experiment. The Chinese corpus is 
segmented and tagged with parts of speech. Unseen words like person names are 
often the major source of segmentation errors. For example, a doctor’s name 
(Yang Shi Yi) was segmented to three individual characters. Named entity 
recognition [3] attempts to merge such individual characters. Besides named entities, 
certain word strings denote potential semantic concepts. We group terms that tend to 
co-occur in NTUH corpus by mutual information shown as follows. They form 
phrases or multi-word entities. 



MI (e i , e 2 ) = P (e , |e 2 )log 



p 0i ■ r 2 ) 
Pie 1 )P{e 1 ) 



(1) 



Terms of the same semantic concepts are often represented with similar 
utterance structure and neighbor context. For example, “SHiSiflt - (I 
want the phone number of First Bank) and (I want the 

telephone number of National Taiwan University) are used in CHT corpus. The two 
target elements “fit — (First Bank) and (National Taiwan 

University) have similar left and right contexts, which show that two different names 
denote the similar concepts. Kullback-Leibler distance [5] is used to measure the 
similarity between two contexts, where V denotes the vocabulary used in contexts. 

D (7>i||/’ 2 ) = Z Pi (0 log P '[ I \ ^ 

77) p 2 (0 

D(p 1 || p 2 ) = 0 if p, and p, are equivalent, i.e., they have exactly the same neighboring 
context. Terms with large MI are merged into a larger term and terms with small KL 
distance are grouped into the same cluster. 

Experimental results are judged by human assessors. The results of token 
combination are rated as four levels - say, correct word, correct phrase, nonsense 
(i.e., wrong combination) and longer (i.e., terms contains both correct and incorrect 
combinations of previous iterations). Figure 1 shows that formulation of words starts 
from the beginning of token combination, and grows steadily until middle of process 
(around iteration 30). Phrase combinations occur later at the 15th iteration and the 
number does not increase as fast as word combinations do. Because the number of 
phrases is smaller and the meaningful combinations are formulated from word level to 
phrase level, the number of nonsense and longer combinations increases rapidly in the 
later iterations. Error propagations exist after 10-15 iterations and result in most 
nonsense combinations as the curve goes approximately along with nonsense curve. 

Contextual clusters are rated as three levels - say, correct (i.e., all the clustered 
terms belong to the same semantic concept), wrong (i.e., all the terms are unrelated to 
one another), and part (i.e., some of the clustered items belong to same concept, and 
some are not.) Figure 2 shows that the curves of correct (meaningful) and wrong 
(meaningless) clusters have same tendency. They all increase rapidly in the first 5 
iterations, and then the increase speed slows down. As the iteration proceeds, a cluster 
containing totally unrelated terms would seldom occur because terms with clear 
evidence have been correctly or partly clustered in the previous iterations. Error 
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Fig. 1. Quantitative Result of Token Combination 

propagation begins at the 10th iteration and grows with the number of “part” 
clusters. The number of “part” clusters stops growing after 30 iterations. These 
clusters are in fact unrecognizable for human to make judgment. We can see that the 
meaningful clusters are proposed in the first half of iterations, and most of the later 
iterations provide useless clusters. Therefore a reasonable number of iterations 
should be inspected to stop the clustering algorithm. 




Fig. 2. Quantitative Result of Contextual Clustering 



A concept category groups semantic concepts that serve similar roles or present 
similar intentions in an utterance. At this time, the meanings of the clusters generated 
from previous experiments are undefined. After the clustering results are examined, 
total 44 concepts are identified. We re-label the original corpus with these concepts 
and cluster the corpus again. Table 3 shows some examples of the result. 

Although the number of meaningful categories is only few, the result indeed 
indicates that some concepts can be grouped to proper categories. A complete concept 
set is formulated by using the proposed semi-automatic algorithm. The experiments 
show the algorithm is helpful for human to craft domain knowledge. The extracted 
semantic concepts will be used in the following sections. 
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Table 3. Categorization of Semantic Concepts 



Concept Category 


Semantic Concept 


Domain Slot Values 


(B IRTHD ATE_V ALUE), (IDNUMBERJVALUE), 
(RECORDNUM_VALUE), ( D ATE_V ALUE) . 
(TIME_V ALUE) 


Domain Slot Names 


(DATA), (ID_NUM), (RECORD_NUM), 
(PATIENT_TYPE), (DEPARTMENT_TYPE) 


Domain Specific 
Actions 


(CHECK). (CANCEL), (CHANGE), (REGISTER) 


Frequently Used Verbs 


(WANT), (REQUEST) 



4 Identification of Speech Act 



Speech acts represent the mental state of the participants in a dialogue. Some words 
in each utterance are replaced with the semantic concepts derived in last section. A 
set of speech acts are defined, including Request-ref, Answer-ref Request-if Answer- 
if Request-fact, Answer-fact, Greeting, Prompt, Clarify, Accept, and Reject [4], Four 
trained annotators were asked to tag corpora with these speech acts. That will serve 
as an answer set for evaluation. Formula 3 defines the Bayesian identification. Given 
a set of conceptual features in an utterance, which are denoted by semantic tags C t , 
we try to find the speech act with the largest probability. Here we assume that each 
concept is independent of each other. 



P 



C 



arg max 

A t 



M 

P(A t )U 

k=l 



p(c t = gK) 

P(C k = c k ) 



(3) 



The precision of the experimental result is 57%. Compared to raw material (i.e., 
words with the N highest tf*idf in an utterance are regarded as features), which has 
only 18% of precision, semantic concepts eliminate data sparseness problem. The bad 
performance of raw data is due to the small size of the corpus from which tf and idf 
are trained. To tell if the derived concepts are redundant, we divide the concepts into 
several subgroups and make the similar experiments again. We find out that the best 
performance occurs when all the concepts are adopted. It shows that the derived 
semantic concepts do capture specific features of domain utterances and are not 
redundant. 

See5 [8], a machine learning tool, is also adopted to identify speech act. 
Semantic tags extracted from each utterance are used as attributes to identify speech 
act. The precision is 65% when all the concepts are considered as clues. Compared to 
the performance of using the raw material, i.e., 53%, semantic concepts got less gain 
than Bayesian method. 



5 Dialogue Modeling 

We adopted frame-based model to formulate the dialogue behavior in this paper. The 
agreements of contexts between participants in a dialogue are based on the content of 
frame slots. Predicting the condition of each frame slot could capture the transition 
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process of dialogues. An information state consists of several semantic slots selected 
from the semantic concepts computed in Section 3. Besides, several conditions are 
defined to represent each slot state, including in question, mentioned, with value, and 
empty, which are denoted by symbols *, +, 1, and 0, respectively, in Table 4. 
Originally, all the frame slots are set to 0. An algorithm shown as follows determines 
the state of each frame slot. 

1 . Check if any slot value is confirmed. If so, fill the slot with value and set the 
state to 1 . 

2. Determine if any questioning clues exist. If yes, tag each slot name mentioned 
along with questioning clues, and set the state to *. 

3. Tag all mentioned slot names as mentioned, and set the state to +. 

Using this algorithm, an input utterance will be transformed into information state 
representation. Total 13 slots, including branch name, sendee type, department type, 
doctor name, week date, time, birth date, ID number, medical record number, patient 
name, number of times coming, and general number, are selected from the derived 
semantic concepts, and form the set of frame states. Not all slots are presented in the 
Table 4. Numbers 1-5 show the first five slots illustrated above. 

Table 4. Examples of Transformation of Information State 



(J 

Si 2 

.= 3 

eg 


U: 8 H n P'3 

1# o (1 want to 
register outpatient 

service of 

ophthalmology 
department.) 


u: mm 

W ? (1 wonder 

on what day there is 
such a service?) 


S: sit II 

(Would you like to 
go to main hospital 
or Gong-gwan 

branch? 


Concepts 

chunked 


(department type) 
(service type) 
(person) (want) 
(register) 


(week) 

(question word) 


(branch namel) 
(branch name2) 
(request) (at) 
(sec doctor) 
(question word) 


Slot 


User 


User 


System 


1 


0 


0 




2 


1 


1 


i 


3 


1 


1 


i 


4 


0 


0 


0 


5 


0 


* 


* 



Following the above procedure, a dialogue corpus could be transformed into 
transitions of a sequence of information states. By using an information state as a 
clue, we have transformed the problem from modeling a complete dialogue into 
predicting next information state. Because the history of dialogue is accumulated in 
terms of state transition, every prediction to the next state actually concerns about all 
the previous dialogue history. 

Due to the sparseness and small amount of corpus, Bayesian prediction is 
divided into two phases: 1) Predict the entire set of states when the previous state 
actually occurred in the training corpus; and 2) If not, assume that each slot is 
independent, and predict each individually. In Formulas 4, 5, 6 and 7, S M is the 
whole frame state in time i+1; s w is a candidate of the next slot state; s t is the current 




















78 



K.-K. Lin and H.-H. Chen 



slot state; f. is a feature presented in the current utterance; and K is the number of 
features. 



/’(■V,,: V,)- 


(4) 


p(.s i+1 \s i )= P(s i+1 ) 

p (l ) 


(5) 


( Iff 'I VTT P ( F J = /tl*«) 


(6) 


score(s j+: ) = p(s i+1 £.)* p{s i+l f) 


(7) 



In the experiment, there are 440 transitions. In 209 of these transitions, the 
previous frame state occurred in the training corpus. The predictions are done trivially 
in phase 1. In the other 231 transitions, each slot state is predicted in phase 2. The 
next frame slots can be predicted from the training experience in about 60% (i.e., 127 
correct among 209 example frames) of the transitions. If the determination is relaxed 
to determining if a slot state should appear, the precision is up to 82%. Table 5 
summarizes the experimental results of phase 1. It shows that if there is a large 
enough training corpus to obtain more reliable dialogue states, the prediction of next 
frame state is feasible. For those not seen in the training set, phase 2 predicts each 
slot state respectively. Table 6 shows the result. Each frame contains 13 slots to be 
predicted. The overall precision is 91.2%. After further analyzing the distribution of 
slot state, we found that most of the slot states (i.e., 97.4%) are not change through 
transitions. In other words, it is much easier to predict an unchanged slot state than a 
changed one. In our experiment, 50.6% of slot states which are changed can be 
correctly predicted, on the other hand, 92.34% of slot states which are not changed 
can be predicted correctly. 



Table 5. Phase 1 Result 



Number of frame 


209 


Correct 


127 


Plain Correct 


45 


Incorrect 


37 


Precision 


82.3% 



Machine learning method is also applied to predict each slot state. Semantic 
features, the previous slot state, the role and the speech act of current utterance are 
considered as the attributes for each slot state. The result shows that the most 
important attributes are the previous slot states. The precision drops from 98% to 7% 
without consideration of the previous slots. 
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Table 6. Phase 2 Result 



Number of frame 


231 


Number of slots 


3003 


Correct 


2740 


Incorrect 


263 


Precision 


91.2% 




ATIS 

Word 



Word 



■TWR 

Word 



■NTUH 

Word 



Fig. 3. Results of Word Combination in Different Corpora 




Iteration 



ATIS 

Correct 



Correct 



— TWR 
Correct 



— ■ — NTUH 
Correct 



Fig. 4. Results of Clustering in Different Corpora 



6 Shifting Domain 

The proposed method is experimented on the NTUH service domain in the previous 
sections. To test its portability, the method is also applied to the other three different 
domains. 

In semantic concept acquisition, we can see that each domain presents similar 
tendency, as shown in Figures 3 and 4. The difference lies mainly in different sizes 
and characteristics of corpora. For example, CHT corpus contains many named 
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entities and special representations, so that the proportion of correct combinations is 
higher. 

In speech act identification, the precision rates in CHT corpus are 77% with 
machine learning method and 57% with Bayesian method. CHT is comparatively a 
simpler domain than NTUH. The most frequent speech acts in CHT corpus are 
Request-ref and Answer-ref. In dialogue modeling, average frame state prediction 
(phase 1) is 85%, and slot state prediction (phase 2) is 87%. 



7 Conclusions and Future Work 

This paper proposes a systematic procedure to extract domain knowledge for dialogue 
adaptation. Semantic acquisition extracts key concepts semi-automatically to decrease 
human intervening cost. Speech act identification recognizes current intent and focus 
of an utterance. Regarding derived features as frame states, dialogue transition is 
modeled as prediction of frame states. Applying the procedure to four different 
domains shows its portability. More data collection and domains will be experimented 
in future work. 
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Abstract. Classic parsing methods use complete search techniques to 
find the different interpretations of a sentence. However, the size of the 
search space increases exponentially with the length of the sentence 
or text to be parsed and the size of the grammar, so that exhaustive 
search methods can fail to reach a solution in a reasonable time. 
Nevertheless, large problems can be solved approximately by some kind 
of stochastic techniques, which do not guarantee the optimum value, 
but allow adjusting the probability of error by increasing the number of 
points explored. Evolutionary Algorithms are among such techniques. 
This paper presents a stochastic chart parser based on an evolutionary 
algorithm which works with a population of partial parsings. The paper 
describes the relationships between the elements of a classic chart 
parser and those of the evolutionary algorithm. The model has been 
implemented, and the results obtained for texts extracted from the 
Susanne corpus are presented. 

Keywords: Evolutionary programming, Partial Parsing, Probabilistic 
Grammars 



1 Introduction 

Parsing a sentence can be sought as a procedure that searches for different ways of 
combining grammatical rules to find a combination which could be the structure 
of the sentence. A bottom-up parser starts with the sequence of lexical classes 
of the words and its basic operation is to take a sequence of symbols to match it 
to the right-hand side of the rules. Thus, this parser can be implemented simply 
as a search procedure for this matching process. However, such implementation 
would be extremely expensive because the parser would try the same matches 
again and again. This problem is avoided in the chart parsing algorithms by 
introducing a data structure called chart [1]. This structure stores the partial 
results of the matchings already done. 

Classical parsing methods are based on complete search techniques to find 
the different interpretations of a sentence. However, experiments on human pars- 
ing suggest that people do not perform a complete search of the grammar while 

* Supported by projects TIC2003-09481-C04 and 07T/0030/2003. 
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parsing. On the contrary, human parsing seems to be closer to a heuristic pro- 
cess with some random component. This suggests exploring alternative search 
methods. Another central point when parsing is the need of selecting the “most” 
correct parsing from the multitude of possible parsings consistent with the gram- 
mar. In such a situation, some kind of disambiguation is required. Statistical 
parsing provides a way of dealing with disambiguation. Stochastic grammars [4], 
obtained by supplementing the elements of algebraic grammars with probabil- 
ities, represent an important part of the statistical methods in computational 
linguistics and have allowed important advances in areas such as disambiguation 
and error correction. A probabilistic context free grammar, PCFG, is defined as 
a CFG along with a set of probabilities on the rules such that 

p ( Ni if) = 1 

3 

for all i, where N l is a nonterminal and rf is a sequence of terminals and non- 
terminals. 

In order to improve the efficiency of a parser based on a PCFG, we can 
develop algorithms that attempt to explore the higlr-probability components 
first. These are called best- first parsing algorithms. The goal is to find the best 
parse quickly and thus to avoid exploring much of the search space. Chart parsing 
algorithms can be easily modified to consider the most likely components first. 

Another alternative to search the parses are evolutionary algorithms (EAs). 
EAs have already been applied to some issues of natural language processing 
[8], such as query translation [10], inference of context-free grammars [13,12,9, 
7], tagging [3], parsing [2], word sense disambiguation [5], and information re- 
trieval [6] . In the system described in [2] parse trees are randomly generated and 
combined. The evolutionary process is in charge of giving low rates of probability 
of surviving to those trees which do not match the grammar rules properly. This 
system has been tested on a set of simple sentences, but the size of the popu- 
lation required to parse real sentences with real grammars, as those extracted 
from a linguistic corpus, is too large for the system to work properly. 

This paper presents the implementation of a stochastic bottom-up chart 
parser based on an evolutionary algorithm which works with a population of 
partial parsings. The algorithm produces successive generations of individuals, 
computing their quality or “fitness” at each step and selecting the best of them 
for the next generation. The purpose of most EAs is to find a good solution and 
not necessarily the best solution, and this is enough for most natural language 
statistical processes. EAs provide at the same time a reasonable accuracy as well 
as a unified scheme of algorithm applicable to different problems. 

The rest of the paper proceeds as follows: Section 2 describes the evolutionary 
parser, presenting the main elements of the evolutionary algorithm; section 3 
presents and discusses the experimental results, and section 4 draws the main 
conclusions of this work. 
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function Arc-Extension(chart, agenda, grammar, C, pi, p 2 ) 
Insert(chart, C, pi, p2); 

for each active arc X — » Xi, • • • , o C, X„ from po to pido{ 
AddNew Active Arc (chart, X — > Xi, ■ ■ ■ , C o X n , po, P 2 ); 

} 

for each active arc X — > Xi , • • • , A'„ o C from po to pido{ 
AddNewComponent(agenda, X, po, P 2 ); 

} 

end 



Fig. 1 . Arc extension algorithm to add a component from position pi to position p2 



2 Evolutionary Algorithm for Chart Parsing 

The algorithm can be view as a probabilistic implementation of a bottom-up 
chart parser. In a chart parser, the chart structure stores the partial results of 
the matchings already done. Matches are always attempted from one component, 
called key. To find rules that match a string involving the key, the algorithm looks 
for rules which start with the key, or for rules which have already been started 
by early keys and require the present key either to extend or to complete the 
rule. The chart records all components derived from the sentence so far in the 
parse. It also maintains the record of rules that have partially matched but are 
incomplete. These are called active arcs. The basic operation of a chart parser 
consists in combining an active arc with a completed component. The result is 
either a new completed component or a new active arc that is an extension of 
the original active arc. Completed components are stored in a list called agenda 
until being added to the chart. This process is called arc extension algorithm, 
of which Figure 1 shows an scheme. To add a component C into the chart from 
position pi to position p 2 , C is inserted into the chart between those positions. 
Then, for any active arc of the form X Xi, ■ ■ ■ , o C, X n (where o denotes the 
key position) from p 0 to p\, a new active arc X X 1; • • • , C oX n is added from 
position po to p 2 . Finally, for each active arc X Xi, • • • , X n o C from position 
Po to p\, which only requires C to be completed, a new component of type X is 
added to the agenda from position p 0 to p\. Figure 2 shows a scheme of the chart 
parsing algorithm. It consists in a loop repeated until there is no input left. At 
each iteration, if the agenda is empty, the lexical categories for the next word of 
the sentence are added to the agenda. Then a component C is selected from the 
agenda. Let us assume it goes from position p\ to p 2 . For each grammar rule of 
the form X CX 1, • • • , X n , a new active arc X — > 0CX1, • • • , X n from p\ to 
P2 is added from position pi to p 2 . Finally, C is added to the chart by means of 
the arc extension algorithm. 

Probabilistic Chart parsing algorithms consider the most likely components 
first. The main idea is to implement the agenda as a priority queue — where the 
lriglrest-rate elements are always first in the queue. Accordingly, the parser always 
removes the highest-ranked component from the agenda and adds it to the chart . 
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function ChartParser(sentence, chart) 
while cont < Length(sentence) do{ 
if Empty(agenda) { 

AddInterpretation(sentence[cont] ) ; 
cont++; 

} 

SelectComponent (agenda, C, pi, P2); 

for each grammar rule X — » CA'i • • • X n from p 1 to p 2 do{ 
AddNew Active Arc (chart, X — » oCX\ ■ ■ ■ X n , pi, P2); 

} 

ArcExtension (chart, agenda, grammar, C, pi, p-fy, 

end 



Fig. 2. Bottom-up chart parsing algorithm 



In the evolutionary parser (EP), the population is composed of partial parses 
of different sequences of words of the sentence, that are randomly combined to 
produce new parses for longer sequences of words. Thus, the agenda of a chart 
parser, which stores completed components, is represented by the population of 
the EA. The algorithm of arc extension is represented by the crossover operator, 
which combines partial parses until completing the categories requires by the 
right-hand side of a rule, so as to produce a new completed component of the 
classes given by the left-hand side of the rule. At this point, the EP differs of 
the chart parsing because the arc extension algorithm is continuously applied 
until completing a component. Thus, in the EA active arcs only exist during 
the crossover operation, because when this finishes every arc explored has been 
completed. Another difference comes from the way of selecting the rules to be 
applied. A best-first parsing algorithm always selects the most likely rule, while 
the EA, can select any rule, though those of higher probability have more chances 
to be selected. 

Finally, the input data for the EA are those of the classic chart parser: the 
sentence to be parsed, the dictionary from which the lexical tags of the words 
can be obtained along with their frequencies, and the PCFG, together with the 
genetic parameters (population size, crossover and mutation rates, etc.). 

Let us now consider each element of the algorithm separately. 



2.1 Chart Representation: The EA Population 

The chart data structure stores all intermediate results of the parsing, that is, 
any valid parse of the substrings of the input. This intermediate results are the 
edges of the chart. Thus, each edge stores three things: the grammar rule, the 
parse subtree and the corresponding locations in the sentence. 

Individuals in our system can be view as edges, directly represented by sub- 
trees. They are parses of segments of the sentence, that is, they are trees obtained 
by applying the probabilistic CFG to a sequence of words of the sentence. Each 
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ATI JJ 

a commercial 

lnd. 1 Ind. 2 



NNlc 

artist 



Ns 




ATI JJ NNlc 

a commercial artist 




Ind. 3 

P 




AT Tg NNJlc 

the | department 

Vg 

I 

VVGv 

advertising 



Ind. 5 



AT Tg NNJlc 

the | department 

Vg 

I 

VVGv 

advertising 



N 




NPM1 YC MCy YC 
January , 1946 , 



Ind. 4 



Ind. 6 



Fig. 3. Examples of individuals for the sentence The new promotion manager has been 
employed by the company since January +, 1946 +, as a commercial artist in the 
advertising department +. 



individual is assigned a syntactic category: the left-hand side of the top-level 
rule in the parse. The probability of this rule is also registered. The first word 
of the sequence parsed by the tree, the number of words of that sequence and 
the number of nodes of the tree are also registered. Each tree is composed of a 
number of subtrees, each of them corresponding to the required syntactic cate- 
gory of the right-hand side of the rule. Figure 3 shows some individuals for the 
sentence The new promotion manager has been employed by the company since 
January +, 19^6 +, as a commercial artist in the advertising department. +., 
used as a running example, which has been extracted from the Susanne corpus. 
We can see that there are individuals composed of a single word, such as 1 , while 
others, such as 5, are a parse tree obtained by applying different grammar rules. 
For the former, the category is the chosen lexical category of the word (a word 
can belong to more than one lexical class); e.g., the category of 1 is ATI. For the 
latter, the category is the left hand-side of the top level rule; e.g., the category 
of 5 is Ns. 



Chart Initialization: The Initial Population. The first step in parsing a 
sentence with a bottom-up chart parser is to initialize the chart with word edges. 
For each lexical rule which expands to a given word, a corresponding edge is 
created. This edge stores a leaf tree and a location equal to the location of the 
word in the sentence. 

Lexical rules can be obtained by finding the possible lexical tags of each 
word. For example, for the word as, which has the tags CSA, Ha and RGa in the 
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Susanne corpus, we will include in the grammar the rules CSA — > as, Ha — > 
as and RGa — > as. The lexical tags are obtained, along with their frequencies, 
from a dictionary. 

The initialization of our system amounts to creating the initial population 
of the evolutionary algorithm which, like for the chart parser, is composed of 
individuals that are leave trees formed only by a lexical category of the word. 
The system generates a different individual for each lexical category of the word. 

In order to improve the performance, the initial population also includes 
individuals obtained by applying a grammar rule provided that all the categories 
of the right-hand side of the rule are lexical. The individual 6 of Figure 3 is one 
such example. 

2.2 Arc-Extension Algorithm: The Genetic Operators 

The edge set represents the state of a chart parser during processing, and new 
edges can be inserted into the set at any time. Once entered, an edge cannot be 
modified or removed. New edges can correspond to complete components or to 
active arcs, pending of further extension to be completed. 

In our system, active arcs only have a temporary existence and they do not 
appear in the population, which is only composed of complete components. The 
process of extending an arc is not partitioned in a number of steps, but is done 
as a whole in the crossover operation. New individuals are created by means 
of two genetic operators: crossover and cut. The crossover operator combines a 
parse with other parses present in the population to satisfy a grammar rule; cut 
creates a new parse by randomly selecting a subtree from an individual of the 
population. The rates of crossover and cut operations performed at each step 
are input parameters, to which the algorithm is very sensitive. 

At each generation genetic operators produce new individuals which are 
added to the previous population that in this way is enlarged. The selection 
process is in charge of reducing the population size down to the size specified as 
an input parameter. Selection is performed with respect to the relative fitness of 
the individuals, but it also takes into account other factors to ensure the pres- 
ence in the population of parses containing words that can be needed in later 
generations. 



Reproduction. The crossover operator produces a new individual by combining 
an individual selected from the population with an arbitrary number of other 
ones. Notice that the crossover in this case does not necessarily occurs in pairs. 
The operator repeatedly applies the arc-extension algorithm until completed 
components are obtained. The individuals to be crossed are randomly selected. 
This selection does not consider the fitness of the individuals because some 
grammar rules may require, to be completed, individuals of some particular 
syntactic category for which there are not higher fitness representatives. 

Crossover begins by selecting an individual from the population. For example, 
let us assume that the individual 1 of Figure 3 is selected. Then the process 
continues as follows: 
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— Identify the syntactic category of the root of the tree to be combined. For 
Indl it is ATI. 

— Select among the grammar rules those whose right-hand side begins with 
this syntactic category. For the grammar used in this work, some examples 
of these rules for the ATI category are: 

Ns -> ATI JJ NNlc P 

Ns -> ATI JJ NNln P 

Ns -y ATI JJ Tg NNlc P 

Ns -y ATI NNlc Po 

Ns -y ATI NNlc YC Nns YC MCn 

Ns -y ATI NNln Po 

Let us assume that the first rule is the selected one. 

— For each category of the right hand side of the rule after the first one, search 
in the population for an individual whose syntactic category matches the 
considered category, and whose sequence of words is the continuation of the 
words of the previous individual. In the example, we look for an individual of 
category JJ, other of category NNlc and other of category P. The sequence of 
words of the individual of category JJ must begin with the word commercial , 
the word which follows those of the individual 1. Accordingly, the individual 
2 of Figure 3 is a valid one (likewise, individuals 3 and 4 are also chosen for 
the crossover). 

— Construct a new individual which has in its root the syntactic category of 
the rule (Ns) and is composed of the subtrees selected in the previous step, 
what produces the individual 5 of Figure 3. 

— Add the new individual to the population. 

With this scheme, the crossover of one individual may produce no descendant at 
all, or may produce more than one descendant. In this latter case all descendants 
are added to the population. The process of selection is in charge of reducing 
the population down to the specified size. Figure 4 shows a scheme of the oper- 
ator. For each individual in the population the operator is applied according to 
the crossover rate. If a individual is selected to be crossed, the function Search- 
GrammarRules returns the set of grammar rules whose right-hand side begins 
with the category of the individual. Then, for each of these rules, the function 
Searchlndividuals searches the population for individuals to complete the right- 
side of the rule. If they are found, new individuals are created for each possible 
combination ( Create Tree) and added to the population. If crossover is applied 
alone, the mean size of the individuals increases at each generation. Though this 
is advantageous because at the end we are interested in providing as solutions 
individuals which cover the whole sentence, it may also induce some problems. If 
the selection process removes small individuals which can only can be combined 
in later generations, the parses of these combinations will be never produced. 
This situation is prevented by applying some constraints in the selection process, 
as well as by means of the cut operator. 
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function Crossover(Population, per.crossover) 
for each individual in Population do{ 
prob = random(lOO); 
if (prob < per.crossover) { 

Rules = SearchGrammar Rules (individual); 
for each rule in Rules do{ 

Trees = SearchIndividuals(Population, rule, individual); 
for each selection in Trees do{ 

newjndividual = CreateTree(rule, individual, selection); 
population. add(new jndividual) ; 

} 

} 

} 

end 



Fig. 4. Scheme of the crossover operator 



The Cut operator. Because our chart representation does not ensures the 
presence of any edge previously produced, we include the cut operator, which 
allows extracting a part of a parse from an individual. The rate of application of 
the cut operator increases with the length of the individuals, while crossover is 
applied at a fixed rate. Thus, in the beginning of the evolution process, crossover 
is applied almost alone, and so the length of the individuals increases. Later 
on, when the length of the individuals is long enough, cut and crossover are 
applied together. Accordingly, the application of the cut operator depends on two 
parameters, per^cut and threshold-cut. Per-Cut is the percentage of application 
of cut, while threshold-cut is the minimum number of words of the individual 
required to allow the application of cut. It is given as a percentage of the length 
of the sentence being parsed. Figure 5 shows a scheme of this operator. For each 
individual in the population, the conditions to apply cut are checked. If they are 
fulfilled, a subtree of the parse tree of the individual is randomly selected and 
added to the population. 



function Cut(Population, per.cut, threshold cut) 
for each individual in Population do{ 

if number .words (individual > threshold_cut){ 
prob = random(lOO); 
if (prob < per.cut) { 

newjndividual = choose_random_subtree(individual); 
population. add(new jndividual); 

} 

} 

end 



Fig. 5. Scheme of the cut operator 
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2.3 Selection of Components: Fitness Evaluation 

A classic parser produces any valid parse of the substrings of the input. This 
is not guaranteed in an evolutionary algorithm, which tends to generate highly 
probable individuals, though individuals with low probability also have some 
opportunities to be produced and survive. Thus, we need a measure of the indi- 
vidual quality or fitness. 

The fitness function is basically a measure of the probability of the parse. It 
is computed as the average probability of the grammar rules used to construct 
the parse: 

y pi'ob(si) 



fitness 



VsiGT 

nn(T) 



where T is the tree to evaluate, s* each of its nodes and nn(T) is the number of 
nodes. For the lexical category, the probability is the relative frequency of the 
chosen tag. 



2.4 The Selection Process 

Selection usually replaces some individuals of the population (preferably those 
with lower fitness) by others generated by the genetic operators. However, there 
are two issues that make selection a bit different in our case. First at all, our ge- 
netic operators include every new individual in the population, that in this way 
grows arbitrarily and therefore needs to be reduced to a suitable size. And sec- 
ondly, if fitness were the only criterion to select the individuals to be eliminated, 
individuals that are the only ones parsing a particular word of the sentence could 
disappear, thus making impossible to generate a complete parse of the sentence 
in later generations. Accordingly, our selection process reduces the size of the 
population by erasing individuals according to their fitness but always ensuring 
that each of their words are present in at least other individual. If the population 
size popusize is not enough to allow this, the parsing process is halted. However, 
this situation is not to be expected because any population size larger than the 
length of the sentence is enough to guarantee this condition. Figure 6 presents 
an scheme of the selection process. First at all, in order to improve diversity, du- 
plicated individuals are erased from the population (Eliminate -duplicate) . The 
function ChooseToErase returns the sequence with the tentative order in which 
the individuals must be erased. This order is randomly determined with proba- 
bility inversely proportional to the fitness. Then a loop which erases individuals 
is repeated until the population is sufficiently reduced. The function Sequen- 
cePresent checks that every of word in the sequence parsed by the individual 
ToErase[i] is present in at least another individual. 

3 Experimental Results 

The algorithm has been implemented using C-l — I- language and run on a Pen- 
tium III processor. In order to evaluate its performance we have considered the 
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function Selection(Population, popu_size) 
Eliminate_duplicate (Population) ; 
ChooseToErase(Population, ToErase) ; 
i = 0; 

while Size(Population) > popu_size do{ 

if SequencePresent (Population, ToErase [i]){ 
Erase(Population, ToErase[i]); 

i++; 

} 

} 

end 



Fig. 6. Scheme of the selection process 



parsing of the sentences extracted from the Susanne corpus [11], a database of 
annotated sentences from the Brown Corpus of written American English man- 
ually annotated with syntactic information. The Susanne analytic scheme has 
been developed on the basis of samples of both British and American English. 
The corpus comprises 64 files of annotated text and a lexicon. Each file has a line 
for each word of the original text. In this corpus, punctuation marks and the 
apostrophe-s suffix are treated as separate words and assigned separate lines. 
Each line has six fields, which contain at least one character. Figure 7 shows 
some lines of one Susanne file. 



A01 : 0120 . 21 


- 


AT 


the 


the 


[Ns-. 


A01 : 0120 . 24 


- 


NNlc 


number 


number 




A01 : 0120 . 27 


- 


10 


of 


of 


[Po . 


A01 : 0120 . 30 


- 


NN2 


voters 


voter 


.Po]Ns-] 



Fig. 7. Sequence of lines extracted from the file A01 of the Susanne corpus. The fields 
of each line are reference, status, wordtag, word, lemma and parse. 



The grammar used herein has been read off the parsed sentences of the Su- 
sanne corpus. In order to simplify the process, those sentences which make refer- 
ence to elements outside them ( trace sentences) have not been used to extract the 
grammar. Each grammar rule is assigned a probability computed as its relative 
frequency with respect other rules with the same left-hand side 1 . 



1 That is, if we are considering the rule r of the form A — > • • •, the probability of r is 
computed as: 

P(r) = — 

U Er'=^-# r ' 

where ffr is the number of occurrences of r 
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3.1 Recall, Precision, and Accuracy 

Recall, precision and accuracy are the most commonly measures used for parsing 
evaluation. Precision is given by the number of brackets in the parse to evaluate 
which match those in the correct tree; recall measures how many of the brackets 
in the correct tree are in the parse, and accuracy is the percentage of brackets 
from the parse which do not cross over the brackets in the correct parse. 

One of the several reasons why a parser can produce a wrong parse for a 
sentence is that the necessary rules are not present in the grammar due to lack 
of statistics in the corpus. It is a serious problem when using the Susanne corpus 
because of its large tag sets. However, if we are mainly interested in evaluating 
a parser, this problem can be circumvented by applying the parser to sentences 
from the training corpus. Thus we have tested the parser on a set of 17 sentences 
from the training corpus (the average length of the sentences is 30 words). In 
order to compare the EA with a classic parser, we have implemented a classic 
best-first chart parsing (BFCP) algorithm. Table 1 shows the precision, recall, 
accuracy and tagging results obtained for grammars of different size (best results 
achieved in ten runs). We can observe that the results of the EP improve those 
of a classic chart parser for the first grammar. Though these results get a bit 
worse when the size of the grammar is enlarged, they can be improved again 
by modifying the parameters of the evolutionary algorithm (those employed are 
suitable for the grammar of 225 rules) . Anyway, the Susanne corpus produces too 
large grammars, inappropriate for the EP, so we expect to improve the results 
by using more appropriate corpus. 

What is most relevant in the obtained results is that the EP is able to reach 
a 100% in any of the three measures, while the probabilistic chart parsing does 
not reach this value simply because the correct parse of some sentences is not 
the most probable one. In this way the heuristic component of the evolutionary 
algorithm shows its usefulness for parsing. 



Table 1. Results obtained for different sizes of the grammar with a best-first chart 
parser (BFCP) and with the evolutionary parser (EP). 





225 

BFCP 


r. 

EP 


446 

BFCP 


r. 

EP 


795 

BFCP 


r. 

EP 


Precision 


99.23 


100 


99.23 


99.01 


99.23 


97.48 


Recall 


99.23 


100 


99.23 


99.01 


99.23 


94.86 


Accuracy 


98.20 


100 


98.20 


99.01 


98.20 


97.42 


Tag. accuracy 


100 


100 


100 


100 


100 


99.61 
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4 Conclusions 

This paper presents an implementation of a probabilistic chart parser by means 
of an evolutionary algorithm. It works with a population of partial parses for 
a given input sentence and a given grammar. Evolutionary algorithms allow 
a statistical treatment of the parsing process, providing at the same time the 
typical generality of the evolutionary methods, which allows to use the same 
algorithm scheme to tackle different problems. 

The grammar and sentences used to evaluate the system have been extracted 
from the Susanne corpus. Measures of precision, recall and accuracy have been 
provided, obtaining results which improve those of a classic chart parser, thus 
indicating that the evolutionary algorithms are a robust approach for natural 
language parsing. Moreover, the heuristic component of these algorithms seems 
to harmonize with the non deterministic nature of the natural language tasks. 

Another conclusion of this work is the inappropriateness of the Susanne cor- 
pus for statistical processing. The sets of lexical and syntactic tags used in this 
corpus are too large to get significant statistics. We thus plan to test the system 
on other corpora. 
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Abstract. In this paper, we present a probabilistic shift-reduce parsing model 
which can overcome low context-sensitivity of previous LR parsing models. 
Since previous models are restricted by LR parsing framework, they can utilize 
only a lookahead and a LR state (stack). The proposed model is not restricted 
by LR parsing framework, and is able to add rich contextual information as 
needed. To show an example of contextual information designed for applying 
the proposed model to Korean, we devise a new context scheme named “sur- 
face-context-types ” which uses syntactic structures, sentential forms, and selec- 
tive lexicals. Experimental results show that rich contextual information used 
by our model can improve the parsing accuracy, and our model outperforms the 
previous models even when using a lookahead alone. 



1 Probabilistic Shift- Reduce Parsing Model 

Since the first approach [1] and [2] of integrating a probabilistic method with the LR 
parsing technique, some standard probabilistic LR parsing models have been imple- 
mented. [3] and [4] (or [5]) defined a parse tree candidate T as a transition sequence 
of LR state [3] or LR stack [4] that is driven by an action and a lookahead, as follows: 

/„a, /„«, ?i,“i hA l L-i.<Vi La* Qt 

=> S„_1 => S m [3] cr 0 =>(Tj =>... => cr„_ J =» a„, [4] 

where ,s\ and cr are the r-th state [3] and stack [4], l. is the ;-th lookahead, a, is the 
action that can be performed for the lookahead and the state (or the stack), and m is 
the number of actions to complete parsing procedure. A state/stack transition se- 
quence gives the following probabilistic model: 

P(T)= ]^ />(/,. | [3] P(T)= />(/,., a,. ,<7,. | o-,..,.) [4] (2) 



These models are less context-sensitive, because the selection of action can be af- 
fected by information beyond the LR parsing framework, such as LR parsing table, 
LR stack, lookahead. 

As actions are performed, not only the stack but also the input are changed. We 
propose a probabilistic shift-reduce parsing model considering both of the stack and 
the input word sequence W= w r ..w n , as follows: 

A. Gelbukh (Ed.): CICLing 2004, LNCS 2945, pp. 93-96, 2004. 
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P(T | W) = P(T | w l n ,a 0 ,a 0 ,l 0 ) = P(a x 



< h...m I ' 



,a Q ,tJ Q ,l 0 ) 



( 3 ) 



where a 0 , cr 0 , and l 0 are the initial action, stack 1 , and lookahead introduced for satis- 
fying the formula. This equation is decomposed as follows: 



P( a l...m * » h...m I w l...n > fl 0 ♦ ^0 > ^0 ) 




I w l...» » a 0...i-l » ^O-.i-l > ^O...i-l ) 
x P(a f | n , a 0 1^,(7 0i _ l ,l 0i _ l ,l i )x P{<J i | Wj 



a Q...i-\ > ^ O-.i-l > ^0 ... i — l > » h ) 



( 4 ) 



We assume that the history of action a 0 . , (until cr ) is not influential, and the latest 
action, stack and lookahead have any effect on the next action, stack, and lookahead, 
namely: 

pip\w) (5) 

n ^di I w l...it ’ a 0,.H > ^ O.J-1 ’ V..i-1 ) 

=| m X P(a t | w 1 n , ao...i-i’ <r o...i-ido...i-id,')x P(& i \ w i... n ■ <*o ...j-i, <7 0 do...i-i , a,' ,h ) 

= Pdj \ wi_ /,_! I W, „ , cr ,._ x . /,) X P((J i I Wl „ , Qj , (J i_i, l i ) 



In the above equation, the second factor can be estimated such that 

Z p , i 0 / ) = | . The first and the third factor are deterministically 1, 

OEfshJ ft.rcduccj ‘ 1 "”’ 1 ^ ' f 

because /, is naturally determined by vv, n , cr ,, and / M , and cr can be uniquely deter- 
mined by a., cr 4 , and /.. As a result, the parse probability can be summarized as fol- 
lows: 

P(T\W)= J} P(a i \w l _„,a i _ 1 ,l i ) (6) 

( = 1 ... m 

Here, a lookahead an element of W, also indicates a stack-input boundary which 
is an imaginary border line between the stack and the input. Our model is not neces- 
sarily restricted by LR parsing framework and can use rich contextual information by 
proper assumption. Moreover, it is more intuitive than the previous models in that it 
evaluates the probability of action for the given conditional (contextual) information. 



2 Contextual Information for Shift-Reduce Parsing of Korean 

We show an example of contextual information designed for applying the proposed 
shift-reduce parsing model to Korean. Using a shift-reduce parser, we generate bi- 
nary-branching phrase structure. Based on the observation of characteristics of Ko- 
rean that the functional words are so developed that they can represent the structure of 
a phrase or a sentential form by themselves, we have devised the context schemes 
using mainly functional words named surface-context-types that is composed of 
following three components: 

• Surface-Phrasal-Type (SPT) represents the abbreviated syntactic structures of 
the two sub-trees to be reduced or not on the top of the stack cr (we call them stack- 



1 Unlike [3] and [4], we assume that stack transition starts from cr,, not cr 0 . 
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“Ilf Willi Is fill to Uie Hospital 

due to excessive eating” 







NP ~ — 







Geu+neun 


0i vaalJs+eul 


Jia+n 


n euteoJJ. j 


byeanarron j +e 


He 


excessive 


eating 


due to 


tire Hospital 


Noun-tPX 


Noun-*PO 


Verb-»EFD 


Noun 


Noun-*PA 


sp Wi., w;t ->= 00000001 sptf< £ )- 




-nasieoj. i 


v 

lbe(/|j)- Noun 



J. bnron+Im+axs+da 
was sent to 

NounfXV-»€PfEFF 

v v ' 

sst(i R , TV, R> -Ol OOOOOO 



Fig. 1. Contextual information used in the proposed model 

top sub-trees from now on).. We represent SPT as a generalized sequence of the ter- 
minal functional words consisting of a quadruple {nvfm_c , head_f , midp_f , 
rest_f). nvfm_c is the mnemonic selected among the mnemonic sequence for 
noun/verb phrase [6]. The next three members correspond the right- most three func- 
tional word. In Fig.l, SPT representations of the left sub-tree NP ( subT(a i ,L)) and the 
right sub-tree PP ( subT(o i ,R/) are provided, where subT(a, ,L/R) is the left/right 
stack-top sub-tree for the stack cr and spt(t) is the SPT representation of tree t. 

• Surface-Sentential-Type (SST) represents the sub-sentential forms outside the 
stack-top sub-trees. We represent SST as a bit-string which is constructed by turning 
on/off the bit-field according to whether specific functional words exist. In Fig.l, 
SST representations of the left area and the right area outside NP and PP are pro- 
vided, where sst(t,W, L/R) is the SST representation of the area left/right to t for the 
given input word sequence W. 



Table 1. Parsing accuracies as the contextual information is accumulated, compared with 
previous models. <1>~<4> denote contextual information used in our model. 





State transition 
131 


Stack transition 
141 


<1> 

Ih 


<2> 

<1>+spf 


<3> 

<2 >+sst 


<4> 

<3 >+tbc 


Labeled Recall 


71.22 


74.27% 


75.57% 


83.78% 


83.98% 


85.77% 


Labeled Precision 


71.30 


74.39% 


75.59% 


83.86% 


84.06% 


85.80% 


Exact Matching 


1 .70% 


3.77% 


5.18% 


14.81% 


15.13% 


16.65% 



• Tree-Boundary-Contentword (TBC) is the right-most terminal content word of 
each stack-top sub-tree. They are adjacent to the boundary between the stack-top sub- 
trees (called ‘tree-boundary’) and are similar to content phrasal heads. Among all the 
content words, we selectively lexicalise some content words especially contributing to 
sentence segmentation. Other words are replaced by part-of-speeches. Such words are 
effective in promoting shift probability for the sub-trees that are likely to be re- 
duced. In Fig.l, the left/right content word for the tree-boundary between NP and PP 
are provided, where tbc( t) is the TBC for the tree t. 

For our probabilistic shift-reduce parsing model, we assume that the contextual in- 
formation is represented by using the surface-context-types and a lookahead, namely: 
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P(T) = JR(a,. | W.ff, _!,/,) = | (7) 

i=l. .m i=l..m 

, where t L = subT(c i , L) and t L = subT(cr i , R) 



For calculating the action probabilities we use the maximum-likelihood estimation, 
and for handling the sparse-data problem we use a deleted interpolation method with 
a backing-off strategy similar to [6]. 



3 Experimental Results 

We used the treebank with 12,084 sentences annotated using the binary-branching 
CFG [7], We have used 10,906 sentences for the training data and 1,178 sentences for 
the test data. Average morpheme length per sentence is 22.5. Table 1 shows parsing 
accuracies as contextual information is accumulated. The rich contextual information 
used by our model improves the parsing accuracy by about 1 1-14 % over the previ- 
ous models. Besides, we observe that the proposed shift-reduce parsing model outper 
forms the previous LR parsing models even when using a lookahead alone (<1>). The 
reason is that our model needs not have a lookahead as requisite information, thus it 
can be more flexible and robust against the data sparseness problem than the previous 
models. 
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Abstract. In this paper, we present the useful features of a syntactic 
constituent for a probabilistic parsing model and analyze the combi- 
nation of the features in order to disambiguate parse trees effectively. 
Unlike most of previous works focusing on the features of a single head, 
the features of a functional head, the features of a content head, and the 
features of size are utilized in this study. Experimental results show that 
the combination of different features such as the functional head feature 
and the size feature is prefered to the combination of similar features 
such as the functional head feature and the content head feature. Be- 
sides, it is remarkable that the function feature is more useful than the 
combination of the content feature and the size feature. 



1 Introduction 

Natural language parsing is regarded as a task of finding the parse tree for a 
given sentence. A probabilistic approach such as PCFG selects the best parse 
tree with the highest probability, which is generated by the production rules. 
However, it cannot select the best parse tree between the parse trees in Figure 
1 because of the same CFG rules. 

In order to improve the syntactic disambiguation, most of recent parsing 
models have been lexicalized[l,2,3,4] so that they can discriminate between 
P(NP/ mother 

— » NP/mother PP/in) and P(NP/portrait — » N P / portrait PP/in). Besides, some 
of them also utilize the inner contexts [2, 5], the outer contexts [3] or the deriva- 
tional history [4, 6]. Still, the parse tree type selected in Figure 1 is the same as 




Fig. 1 . Syntactic Ambiguities of a Noun Phrase by the Syntactic Tag 

A. Gelbukh (Ed.): CICLing 2004, LNCS 2945, pp. 97-101, 2004. 

(c) Springer- Verlag Berlin Heidelberg 2004 




98 



S.-Y. Park et al. 



the parse tree type selected for a noun phrase “the portrait of my mother in oil” 
since the previous models don’t consider the relationship between “portrait” and 
“oil”. Besides, the lexicalized model using the derivational history cannot cal- 
culate the probability without a completed parse because it cannot know what 
word is the head word of a parent or a grandparent in a partial parse tree [4]. 

In this paper, we provide the useful features of a syntactic constituent for 
a parsing model and analyze the combination of the features in order to dis- 
ambiguate parse trees effectively. The rest of this paper is organized as follows. 
Section 2 explains a parsing model using the feature structure, and Section 3 
shows the experimental results of the feature combination. Finally, we conclude 
this paper in Section 4. 



2 A Parsing Model Using the Feature Structure 

Given a part-of-speech tagged sentence W\W 2 ---Wk, the best parse tree is selected 
based on the probability of generating a parse tree, which is calculated by mul- 
tiplying the probabilities of all rules in the parse tree as follows. 

T best (w v c) = aram T ax P(T) = ar3 ™ ax nil P(n Pi -»• Wi) nu+i P ( nPi -»• n L *n Ri ) 

where k is the number of unary rules, j is the number of all rules in the parse 
tree, n Pi is the parent feature structure of the z-tlr rule, n Li is its left child, n Ri 
is its right child, and is the z-th part-of-speech tagged word. 

A feature structure n includes the following seven features as shown in Fig- 
ure 2. syn n describes the syntactic tag. ^ y os n lword n represent the part-of-speech 
tag and the word of its functional head. C poln/™™* d n express the part-of-speech 
tag and the word of its content head. sect n / S int n m ean the number of termi- 
nal words and the section tag such as S(small), M(medium) or L(large). For 
example, a feature structure for a prepositional phrase “in white” contains “P” 
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Fig. 2. Syntactic Ambiguities of a Noun Phrase by the Feature Structure 
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as syn n, “prep/in” as f po S c n/{Zrd n ’ “noun/white” as C pot n /word n , and “S/2” as 



syn' 
size ^ / size^ 
sect 'V int 



In Figure 2 that shows the parse trees generated for a noun phrase “the 
portrait of my mother in white” , a probabilistic parsing model can discriminate 
between P(N/my /mother N/my /mother P/in/white) and P{N/ the /portrait —¥ 

N/ the /portrait P/ in/ white). While the syntactic tag NP of “my mother” is 
identical with the syntactic tag NP of “my mother in white” in Figure 1, 
the feature structure of “my mother” represented by N/2 is not equal to 
the feature structure of “my mother in white” represented by N/4 in Fig- 
ure 2 according to SJ/r jn/ s |^n. In addition, the parse trees selected in Fig- 
ure 2 is not the same as the parse tree for “the portrait of my mother in 
oil” because P( N/my/mother — t N/my/mother P/in/white ) is different from 
P( N/my/mother 
— » N/my/mother P/ in /oil ). 



3 Evaluation of Feature Combination 

In order to evaluate the disambiguating power of each feature combination, we 
select some features as elements of a nonterminal as shown in Figure 3 where func 
describes the functional head feature, cont expresses the content head feature, 
and size means the size feature. And then, we measure the labeled precision, the 
labeled recall, the cross brackets, and the exact matching of the model using the 
feature combination [7]. The treebank of 31,080 Korean sentences that includes 




Fig. 3. The Experimental Results of the Feature Combination 
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the wide variety of Korean sentences are divided into 90% for the training set and 
10% for the test set for experimentation. Also, the test set is sorted according 
to the number of morphemes per sentence. 

Figure 3 shows that the combination of the content feature, the function 
feature, and the size feature performs best. Although the function feature is 
best and the size feature is worst on the performance of a single feature, the 
combination of them is better than the combination of the function feature and 
the content feature. The reason is that the effect of the function feature may 
overlap the effect of the content feature because the former represented by a 
word and its part-of-speeclr is similar to the latter. Therefore, we can say that 
the combination of different features is prefered to the combination of similar 
features. Besides, it is remarkable that the function feature is more useful than 
the combination of the content feature and the size feature. 

4 Conclusion 

In this paper, we represent a syntactic constituent as the combination of a syntac- 
tic feature, content features, functional features, and size features. And then, we 
analyze the disambiguating power of each feature combination for a probabilis- 
tic parsing model. Experimental results show that the combination of different 
features such as the functional head feature and the size feature is prefered to 
the combination of similar features such as the functional head feature and the 
content head feature. Besides, it is remarkable that the function feature is more 
useful than the combination of the content feature and the size feature. For the 
future work, we will try to consider improving the efficiency of the parsing model 
and profoundly analyze the relationship between sparse data problem and the 
word frequency. 
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Abstract. We describe a context-free parsing algorithm to deal with 
incomplete sentences, including unknown parts of unknown length. It 
produces a finite shared-forest compiling all parses, often infinite in 
number, that could account for both the error and the missing parts. 

In contrast to previous works, we derive profit from a finer dynamic 
programming construction, leading to an improved computational 
behavior. We also introduce a deductive construction, which has the 
advantage of simplifying the task of description. 



1 Introduction 

An ongoing question in the design of dialogue systems is how to provide the 
maximal coverage and understanding of the language, finding the interpretations 
that have maximal thresholds, when the computational process must be 
prompted immediately at the onset of new input. This is largely due to the fact 
that the user often does not know the type of questions that the system answers. 
In this sense, it is often better to have a system that tries to guess a specific 
interpretation in case of ambiguity rather than ask the user for a clarification. 
As a consequence, analysis of the utterance should continuously anticipate the 
interaction with the user, based on the expectations of the system. 

To comply with these requests, we need a parser which analyses the input 
simultaneously as it is entered, even when current data are only partially known. 
Two factors are at the origin of this behavior in natural language man-machine 
interfaces, whether text or speech-based. In the case of the former, the input 
language can only be approximately defined and individual inputs can vary 
widely from the norm [6] due to ungrammatical spontaneous phenomena. In the 
case of the latter [7], inputs can only often be considered as a distorted version of 
any of several possible patterns resulting from an erroneous recognition process. 

In this context, our aim is computational. We restrict interaction types to 
only those necessary for immediate understanding using a predictive model 

* Research partially supported by the Spanish Government under projects TIC2000- 
0370-C02-01 and HP2002-0081, and the Autonomous Government of Galicia under 
projects PGIDT01PXI10506PN, PGIDIT02PXIB30501PR and PGIDIT02SIN01E. 

A. Gelbukh (Ed.): CICLing 2004, LNCS 2945, pp. 102-111, 2004. 

(c) Springer- Verlag Berlin Heidelberg 2004 
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based on the parsing algorithm for unrestricted context-free grammars (cfg’s) 
proposed by Vilares in [9]. In relation to previous works [8,3], our proposal 
provides a formal definition framework and an improved computational behavior. 



2 The Standard Parser 



Our aim is to parse a sentence Wi... n = w\ . . .w n according to an unrestricted 
CFG Q = (N, £, P, S ) , where the empty string is represented by e. We generate 
from Q a push-down transducer (pda) for the language C(Q). In practice, we 
choose an LALR(l) device generated by Ice [9], although any shift-reduce 
strategy is adequate. A PDA is a 7-tuple A = (Q, £, A, S, q 0 , Z 0 , Qf) where: 
Q is the set of states, £ the set of input symbols, A the set of stack symbols, qo 
the initial state, Zq the initial stack symbol, Qf the set of final states, and S a 
finite set of transitions of the form 6(p, X, a) 9 ( q , Y) with p, q £ Q, a € £ U {e} 
and X,Y £ A U {e}. Let the PDA be in a configuration (p, Xa, ax), where p is 
the current state, Xa is the stack contents with X on the top, and ax is the 
remaining input where the symbol a is the next to be shifted, x £ £*. The 
application of 8{p, X, a) 9 (q, Y) results in a new configuration (q, Ya, x) where 
a has been scanned, X has been popped, and Y has been pushed. 

To get polynomial complexity, we avoid duplicating stack contents when 
ambiguity arises. We determine the information we need to trace in order 
to retrieve it [4]. This information is stored in a table X of items, X = 
{[q,X,i,j], q £ Q, X £ {e} U {V r>s }, 0 < i < j}; where q is the current 
state, X is the top of the stack, and the positions i and j indicate the substring 
w-i + i . . .Wj spanned by the last terminal shifted to the stack or by the last 
production reduced. The symbol V r , s indicates that the part A r>s+ 1 . . . A rj „ r of 
a rule A r t o —> A r> i . . . A r>nr has been recognized. 

We describe the parser using parsing schemata [5]; a triple (I, XL,!)), with 
X the table of items previously defined, XL = {[a,i,i +1], a = Wi} an initial 
set of triples called hypotheses that encodes the sentence to be parsed 1 , and 

V a set of deduction steps that allow new items to be derived from already 
known items. Deduction steps are of the form {r/i , . . . , % h £ /conds}, meaning 
that if all antecedents ry £ X are present and the conditions conds are 
satisfied, then the consequent £ £ X should be generated. In the case of ICE, 

V = 2? Init U T> Shift U T> Sel U r> Red U V Head , where: 



V shm = {[q,X,i,j]X[q’,£,j,j + 1] 

® Sel = b I q,Xr,n r ,j,j] j 

V Red = {[q,\/ r , s ,k,j][q,S,i,k\X[q',Y 



3 [o, j, j + 1} € XL , 
shifty £ action(q, a) 1 
3 [a,j, j + 1] € XL | 
reduce r £ action(q, a) 1 
s-i ,i,j] /q' £ reveal(q) } 



V lnlt = (h [go, £,0,0]} £)Head _ { [g, V r , 0 , i, j] X [q',s,i,j] / q' £ goto(q,A r ,o ) } 

1 The empty string, e, is represented by the empty set of hypothesis, 0. An input string 
wi... n , n > 1 is represented by {[wi, 0 , 1], [u> 2 , 1 , 2 ], ... , [w„,n- l,n]}. 
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with go £ Q the initial state, and action and goto entries in the PDA tables [1], 
We say that q' £ reveal(q) iff 3 Y £ N U E such that shift q £ action(q' ,Y) or 
q £ goto(q' , Y ), that is, when there exists a transition from q' to q in A. This set 
is equivalent to the dynamic interpretation of non-deterministic pda’s: 

— A deduction step Init is in charge of starting the parsing process. 

— A deduction step Shift corresponds to pushing a terminal a onto the top of 
the stack when the action to be performed is a shift to state st' . 

— A step Sel corresponds to pushing the V ri „ r symbol onto the top of the stack 
in order to start the reduction of a rule r. 

— The reduction of a rule of length n r > 0 is performed by a set of n r steps Red, 
each of them corresponding to a pop transition replacing the two elements 
V,- )S X rtS placed on the top of the stack by the element V r>8 _i-. 

— The reduction of a rule r is finished by a step Head corresponding to a swap 
transition that recognizes the top element V r .o as equivalent to the left-hand 
side A r> o of that rule, and performs the corresponding change of state. 

These steps are applied until new items cannot be generated. The splitting of 
reductions into a set of Red steps allows us to share computations corresponding 
to partial reductions, attaining a worst case time (resp. space) complexity 0(n 3 ) 
(resp. 0(n 2 )) with respect to the length n of the input string [9]. The input string 
is recognized iff the final item [qg, Vo,o, 0, n + 1], q/ £ Qf, is generated. 

When the sentence has several distinct parses, the set of all possible parse 
chains is represented in finite shared form by a CFG that generates that possibly 
infinite set, which is equivalent to using an AND-OR graph. In this graph, AND- 
nodes correspond to the usual parse-tree nodes, while OR-nodes correspond to 
ambiguities. Sharing of structures is represented by nodes accessed by more than 
one other node, and may correspond to sharing of a complete subtree, but also 
to sharing of a part of the descendants of a given node. 

3 Parsing Incomplete Sentences 

In order to handle incomplete sentences, we extend the input alphabet. Following 
Lang in [3], we introduce two new symbols. So, “?” stands for one unknown word 
symbol, and stands for an unknown sequence of input word symbols. 

3.1 The Parsing Algorithm 

Once the parser detects that the next input symbol to be shifted is one of 
these two extra symbols, we apply the set of ded uction steps H incomplete , which 
includes the following two sets of deduction steps: 

/3 [?, j,j + 1] £ H 

^complete = (I 1 ?- ^ W,£,j,j + 1] / shift q , £ action(q, a) } 

/ a £ E 

/3 [*,j,j + 1] £ % 

^incomplete = *> j\ h W , 3, 3 ] / shi ftq' ^ action(q, X) } 

/ X £ N LI E 
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while we maintain the rest of the deduction steps in T > lmt , 2? shlit , £> Sel , 2? Red , and 
pHead_ p rom an intuitive point of view, i^ompiete applies any shift transition 
independently of the current lookahead available, provided that this transition is 
applicable with respect to the pda configuration and that the next input symbol 
is an unknown token. In relation to incomplete ’ ^ applies any valid shift action 
on terminals or variables to items corresponding to pda configurations for which 
the next input symbol denotes an unknown sequence of tokens. Given that in this 
latter case new items are created in the same starting itemset, shift transitions 
may be applied any number of times to the same computation thread, without 
scanning the input string. 

All deduction steps in dealing with incomplete sentences are applied until 
a parse branch links up to the right-context by using a standard shift action, 
resuming the standard parse mode. In this process, when we deal with sequences 
of unknown tokens, we can generate nodes deriving only symbols. This 
over-generation is of no interest in most practical applications and introduces 
additional computational work, which supposes an extra loss of parse efficiency. 
So, our goal is to replace these variables with the unknown subsequence terminal, 

. We solve this problem by extending the item structure in order to consider 
an insertion counter to tabulate the number of syntactic and lexical categories 
used to rebuild the incomplete sentence. When several items representing the 
same node are generated, only those with minimal number of insertions are 
saved, eliminating the rest, which are pruned from the output parse shared- 
forest. 
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Fig. 1. Number of items for the noun's example 
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Formally, items extended with counters, e, are of the form [p, X, i,j, e] and, 
to deal with them, we should redefine the set of deduction steps incomplete 38 
follows: 
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'T'jShift 

^incomplete 



/^Loop_shift 

^incomplete 



/3 [?, j, j + 1] e 'H 

{[Q,£,i,j,e\\- [q',£,j,j + l,e + 1(a)] / shift. q , € action(q,a) } 

/ CL £ 

/3 [*, j,j + 1] € T-L 

[q',s,j,j,e + I(X)\ / shift q , £ action(q, X) } 

/ i e TVur 



where /(X) is the insertion cost for X £ N U If, and we have to adapt the 
previous deduction steps to deal with counters: 



unt = {h [go, £,0,0,0]} 



■^count = h [q', i + 1] 



£>foint = {[g,e,i,j,e] h [g, V r ,n r ,j, j, e] 



/ 3 [a,j,j + 1] € U 
shift q / G action(q, a ) 

' 3 [a, j, j + 1] € H 

reduce r £ action(q, a) 



} 



^count = {[?, V r>s ,fe, j,e][q',e,i,k,e'] h [?', V r , s _i, i, j, e + e'\ /q' G reveal(q) } 
Pc H ount = { [q, Vr.o,*,i,e] h [<?',£,*, j, e] /g' G goto(g, 4 r , 0 ) } 



As for the standard mode, these steps are applied until new items cannot be 
generated. The resulting complexity bounds are also, in the worst case, 0(n 3 ) 
and 0(n 2 ) for time and space, respectively, with respect to the length n of the 
input string. The parse is defined by the final item [qf, Vo,cn 0, n+ 1, e], q/ G Qf. 



3.2 Previous Works 

Both, Tomita et al. [8] and Lang [3], apply dynamic programming techniques 
to deal with no determinism in order to reduce space complexity and improve 
computational efficiency. However, the approach is different in each case: 

— From the point of view of the descriptive formalism, Lang’s proposal 
generalizes Tomita et al.' s. In effect, in order to solve the problems derived 
from grammatical constraints, Earley’s construction [2] is extended by Lang 
to PDA’s, separating the execution strategy from the implementation of the 
interpreter. Tomita et al .' s work can be interpreted as simply a specification 
of Lang’s for lr(0) pda’s. 

— From the point of view of the operational formalism, Lang introduces items 
as fragments of the possible PDA computations that are independent of the 
initial content of the stack, except for its two top elements, allowing partial 
sharing of common fragments in the presence of ambiguities. This relies 
on the concept of dynamic frame for CFG’s [9], for which the transitional 
mechanism is adapted to be applied over these items. Tomita et al. use a 
slrared-graplr based structure to represent the stack forest, which improves 
the computational efficiency at the expense of practical space cost. 

— Neither Lang nor Tomita et al., avoid over-generation in nodes deriving only 

symbols. In relation with this, only Lang includes a complementary 
simplification phase to eliminate these nodes from the output parse shared 
forest. In addition, these authors do not provide details about how to deal 
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with these nodes when they are generated from more than one parse branch, 
which is usual in a non-deterministic frame. 

Our proposal applies Lang’s descriptive formalism to the particular case of an 
LALR(l) parsing scheme, which makes lookahead computation easier, whilst 
maintaining the state splitting phenomenon at reasonable levels. This ensures 
a good sharing of computation and parsing structures, leading to an increase 
in efficiency. In relation to Tomita et aV s strategy, our deterministic domain 
is larger and, in consequence, the time complexity for the parser is linear on a 
larger number of grammars. 

With regard to the operational formalism, we work in a dynamic frame S' 1 , 
which means that our items only represent the top of the stack. This implies 
a difference with Lang’s proposal, or implicitly Tomita et aV s, which used S 2 . 
From a practical point of view, S 1 translates in a better sharing for both syntactic 
structures and computations, and improved performance. 
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Fig. 2. Shared-forest for the noun’s example 



Finally, we solve both the consideration of an extra simplification phase and 
the over-generation on unknown sequences by considering a simple subsumption 
criteria over items including error counters. 

4 Experimental Results 

We consider the language of pico-English to illustrate our discussion, comparing 
our proposal on Ice [9], with Lang [3] and Tomita et aV s algorithm [8]. As 
grammatical formalism, we take the following set of rules: 
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VP — > verb NP 
PP — > prep NP 



S — > NP VP NP — > clet noun 

S -> SPP NP — »■ NP PP 

generating the language. Tests have been applied on input strings of two types: 

det ? verb det noun prep det noun {* noun} 1 {prep det noun} 8-1 prep det noun (1) 

det ? verb det noun prep {* prep} 1 {det noun prep} 8-1 det noun (2) 

where i represents the number of tokens that is, the number of unknown 
sentences in the corresponding input string. This could correspond, for example, 
to concrete input strings of the form: 

The ? gives the cake to the friend {* friend} 1 {of the friend} 8-1 of the boy (3) 
The ? gives the cake to {* of} 1 {the friend of} 8-1 the boy (4) 

respectively. As our running grammar contains rules “NP — > NP PP” and “PP 
— > prep NP”, these incomplete sentences have a number of cyclic parses which 
grows exponentially with i. This number is: 

C 0 = Ci = 1 and Ci = ( 2 .M — , if* > 1 

\i J i+l 



In effect, the parser must simulate the analysis of an arbitrary number of tokens 
and, in consequence, it is no longer limited by the input string. At this point, the 
parser may apply repeatedly the same reductions over the same grammar rules. 
So, although the running grammar is not cyclic, the situation generated is close 
to this kind of framework. More exactly, in dealing with unknown sentences, we 
can derive a non-terminal from itself without extra scan actions on the input 
string. This allows us to evaluate our proposal in a strongly ambiguous context 
with cycles, in spite of the simplicity of the grammar. 
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Fig. 3. Number of items for the prep’s example 



The essential experimental results are shown in Fig. 1 (resp. Fig. 3) in 
relation to running example 1 (resp. example 2), for which the output shared- 
forests are shown in Fig. 2 (resp. Fig. 4). Since the number of possible tree 
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combinations in these forests is exponential, these figures focus only on particular 
examples. In all cases our reference for measuring efficiency is the number of 
items generated by the system during the parsing process, rather than of pure 
temporal criteria, which are more dependent on the implementation. The shared- 
forests represented clearly show the existence of a cyclic behavior and ambiguous 
analyses. 

At this point, we are comparing three dynamic frames. The classic one, S T , 
is comparable to parse methods based on backtracking and including some kind 
of mechanism to detect cycles. In this case, no sharing of computations and 
structures is possible, and it is of only theoretical interest. The other two dynamic 
frames, S 1 and S 2 , are of real practical interest. The first one is considered by 
Ice, while S 2 can be identified in these tests with Lang’s and Tomita et al.'s 
results. 
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Fig. 4. Shared-forest for the prep's example 



In order to allow an objective comparison to be made between all proposals 
considered, we have made the parsing schema used uniform. So, although Lang’s 
algorithm can be applied to any parse strategy, and Tomita et al.'s was originally 
intended for LR(0) pda’s, we have adapted both of them to deal with an lalr(1) 
scheme, as used by Ice. In all cases, these experimental results illustrate the 
superior performance of our proposal, Ice, in relation to previous strategies. 
This is due to the following causes: 



We do not need a supplementary simplification phase in order to eliminate 
nodes deriving only sequences of unknown sequences, , from the output 
structure. 




110 



M. Vilares, V.M. Darriba, and J. Vilares 



— The choice of S' 1 instead of S 2 as dynamic frame provides a better sharing 
efficiency for both structures and computations. As a consequence, the 
number of items generated is smaller. 

In order to illustrate the cost of the previously mentioned simplification phase 
used by Lang and Tomita et al . , Fig. 5 shows the number of items to be 
eliminated in this process for both examples, noun’s and prep’s. We include 
this estimation for S 2 , the original dynamic frame for these proposals, and S 1 . 
In this last case, we have previously adapted the original methods of Lang’s and 
Tomita et al.. 

5 Conclusions 

Dialogue systems should provide total understanding of the input. However, 
in practice, this is not always possible with current technology, even when 
we restrict ourselves to the treatment of a limited domain of knowledge. In 
consequence, robustness becomes crucial in order to find a suitable interpretation 
for the utterance, and we are forced to compute hypotheses to guarantee the 
interactivity in this kind of frames. So, parsing of incomplete sentences is a 
fundamental task in a variety of man-machine interfaces, as part of the more 
general and complex robust parsing activity. This is the case of speech-based 
systems, where the language often appears to contain noise derived from human 
causes such as a stutter or a cough; or even mechanical ones due to an imperfect 
signal recognition. 
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Fig. 5. Items to be pruned in the simplification phase 

In this context, our proposal provides an improved treatment of the 
computation, avoiding extra simplification phases used in previous proposals 
and profiting from the concept of dynamic frame. In particular, this allows the 
sharing of computations and structures, reducing the amount of data to be taken 
into account as well as the work necessary to manipulate them. 
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Abstract. We investigate the effect of unlexicalization in a dependency 
parser for variable word order languages and propose an unlexicalized 
parser which can utilize some contextual information in order to achieve 
performance comparable to that of lexicalized parsers. Unlexicalization 
of an early dependency parser makes performance decrease by 3.6%. 
However, when we modify the unlexicalized parser into the one which 
can consider additional contextual information, the parser performs bet- 
ter than some lexicalized dependency parsers, while it requires simpler 
smoothing processes, less time and space for parsing. 



1 Introduction 

Lexical information has been widely used to achieve a high degree of parsing 
accuracy, and parsers with lexicalized language models [1,2,3] have shown the 
state-of-the-art performances in analyzing English. Most of parsers developed 
recently use lexical features for syntactic disambiguation, whether they use a 
phrase structure grammar or a dependency grammar, regardless of languages 
they deal with. 

However, some researchers recently insisted that the lexicalization did not 
play a big role in parsing with probabilistic context-free grammars (PCFG). [4] 
showed that the lexical bigram information does not contribute to the perfor- 
mance improvement of a parser. [5] concluded that the fundamental sparseness 
of the lexical dependency information from parsed training corpora is not helpful 
to the lexicalized parser, and proposed an accurate unlexicalized parsing model. 

This is the story of analyzing fixed word order languages, e.g. English, with 
a phrase structure grammar. What about parsing other type of languages with 
other type of grammars, without lexical dependency information? For instance, 
can an unlexicalized dependency parser for languages with variable word order 
achieve high accuracy as the unlexicalized PCFG parser for English does? 

This paper investigates the effect of the unlexicalization in a dependency 
parser for variable word order languages and suggests a new unlexicalized parser 
which can solve the problems of the unlexicalized dependency parser. 

A. Gelbukh (Ed.): CICLing 2004, LNCS 2945, pp. 112-123, 2004. 

(c) Springer- Verlag Berlin Heidelberg 2004 
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Table 1. Effect of unlexicalization. The lexicalized parser uses Equation (2), while the 
unlexicalized parser uses Equation (3) 





Lexicalized (Ei-score) 


Unlexicalized (Fi -score) 


Training Set 


0.996 


0.801 


Testing Set 


0.837 


0.801 



2 The Effect of Unlexicalizing Dependency Parser 

It seems that lexicalization may play more role in dependency parsing for vari- 
able word order languages than in parsing fixed word order languages with a 
phrase structure grammar. There are some reasons: dependency parsers can- 
not use information on constituents because the grammar is based on the word 
unit, not on the constituent. Therefore, the disambiguation depends more on the 
lexical dependency between words. Secondly, since the word order is variable, 
which means there is less restriction, it requires higher level information such 
as semantic constraint or lexical preference to offset inexistency of word order 
information. 

To investigate the effect of unlexicalization, we implemented [6]-style parser, 
which uses bigram lexical dependencies and distance measure for syntactic dis- 
ambiguation. The parsing model for a sentence S is : 

|S|-1 

P(t\S) « ll P(depi = h(i)\S) (1) 

i=l 

where |S| is a number of words in S, h(i) is the modifyee of wi, the itli word of 
S, and depi is a dependency relation from W{ to q. The modifyees for each 
word are stated in the tree t, which is a set of modifyees. The probability of each 
dependency relation is : 

P{depi = h(i)\S) w P(link(i,h(i )) = Yes\wi w h <j) A hj ) (2) 

, . , , , (Yes if w x modifies w v 

hnk(x,y)=l [No dge 

where A is a number of features to consider the distance between the two depend- 
ing words. The unlexicalized model is induced by substituting part-of-speeclr 
(POS) tags t for all lexical words w in (2) : 

P(depi = h(i)\S) « P(link(i, h(i)) = Yes\U t h ^ A id ) (3) 

We trained both model (2) and (3) on about 27,000 sentences and tested 
their performance on held-out testing data. The result is on Table 1. 

Overfitting causes the lexicalized parser performs extremely well in the train- 
ing data. On the testing set, unlexicalization hurts the parsing performance by 
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3.6% absolute, which is a sharper drop than the decrease by the unlexicalization 
of PCFG parser for English, which was reported in [4]. 

Despite the poor performance of the unlexicalized dependency parser for 
variable word order language, using unlexicalized parser have considerable ad- 
vantages. First, we can simplify smoothing processes for estimating probabilities 
that are designed to alleviate lexical data sparseness problems. Consequently, 
it increases parsing speed. And eliminating lexical data reduces the space com- 
plexity. 

So we designed a new unlexicalized parser that considers more contexts for 
syntactic disambiguation, yet can parse more accurately. 

3 Revising the Unlexicalized Parser 

We observed that even a variable word order language generates some fixed POS 
tag sequence pattern in a local context. Based on this observation, we use local 
contexts of modifier and modifyee in estimating the likelihood of dependency 
between the two POS tags, and the likelihood of a length of modification relation 
from the modifier. We use the Korean language 1 , which allows variable word 
order, for explaining our ideas. 



3.1 Word Dependency Probability with Local Context 

In the research on a phrase structural parsing model for Korean [7], the outer 
contexts of constituents were found to be useful for syntactic disambiguation. 
We use similar method for our dependency parser. In other words, we consider 
outer contexts of a dependency relation, instead of the constituent, when we 
estimate the word dependency probability : 

P(link(i,j) = Yes\w.i Wj ft P(link(i,j) = Y es\ti tj d>i d>j) (4) 

~ P(link(i,j) = Yes\U tj t;_i t j+ 1 ) (5) 

where is contextual information of u\. According to [7], considering two POS 
tags, one at the left and one at the right of a constituent, was sufficient for 
improving parsing performance. So we substitute a single POS tag for each 
context i.e. (5). 



3.2 Modification Distance Probability Based on Local Contextual 
Pattern 

We observed that a word has the tendency to have a fixed modification dis- 
tance in a certain context. Let’s see an example with Figure 1. The word na-ui 

1 Readers who are unfamiliar with the Korean syntax may refer Appendix at the end 
of this paper for a brief introduction to the Korean syntax. 
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cand. 1 cand. 2 



Tom-i na-ui aideul-ege yeobseo-leul sseosda. 

Tom-SBJ I-GEN children-DAT postcard-OBJ wrote. 



Fig. 1. The sentence means Tom wrote a postcard to my children. The word na-ui has 
two alternative modifyee candidates 




cand. 1 cand. 2 



gongyeon-eun wanjeonhi silpaeha-n gut-euro deurona-tda 

(show-SBJ completely failed that was revealed) 

1 2 3 4 5 

P(dist = 1 | ...) : 94.82% P(dist > 2 | ...) : 5. 17% 




modification distance probability 



Fig. 2. The sentence interpreted as It was revealed that the show was completely failed. 
in English. The arcs at the bottom of the sentence show modification distance proba- 
bility, which is proposed in this paper 



(I-GEN 2 ), which is a noun modifier, has two alternative noun modifyee candi- 
dates: aideul-ege (children-DAT) and yeobseo-leul (postcard-OBj). Here, the first 
candidate is the correct modifyee for the modifier. It is well known to Korean 
users that the word ends with the morpheme -ui (genitive postposition) usually 
modifies the right next word. In other words, the word ends with the genitive 
marker -ui prefers modification distance of 1 in general context. Some rule-based 
or heuristic-based parsers encoded this preference into a rule for syntactic dis- 
ambiguation. 

Let’s see another similar, but more complex example in Figure 2. The adverb 
wanjeonhi (completely) has two alternative modifyee candidates in this sentence. 
They are silpaeha-n (failed) and derona-tda (was revealed), and the former is 
the correct modifyee of the adverb. Finding the correct modifyee is tough in this 
case, even though we consider lexical or semantic information, because the lexical 
or semantic preference of the adverb wanjeonhi to both modifyee candidates are 
similar. 

We define a modification distance probability to solve the problem. It is the 
likelihood of the preferred length of a modification relation for a certain modifier 
in a certain context, which reflects the following two preferences: 

1. Whether a modifier prefers long distance modification or local (short dis- 
tance) modification. 

2 SBJ, GEN, DAT, and OBJ stand for a subjective, genitive, dative and objective case 
marker, respectively. 
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2. If a modifier prefers local modification, which word in the local context is 
preferred as its modifyee. 

The probability of a certain modification distance x for the given modifier 
word Wi and its surrounding context Pi is : 

P(len = x | Wi Pi) k, P[len = x \ Wi- m ■ ■ ■ w i+n ) 

~ P(len = x | t i+m ■ ■ ■ t i+n ) (6) 

where the constants m and n are empirically determined. The length of the 
modification relation x is calculated with the function P(ld), that is 

= uil> WeDi»t=<l,...,fc-l,ta 9 ). 

when Id is a linear distance between two depending words. A constant k is the 
yardstick to decide whether a dependency relation is short or long. We named (6) 
the modification distant probability and used this probability instead of using 
the distance measure as in (3). 

To see an example that uses this probability, revisit Figure 2 which is showing 
the probabilities for each modification distance 3 . The probabilities are calculated 
with the modification distance probability. 4 . 

P(len = 1| mag pvg-etm. nbn-jca) = 94.82 
P(len = 2\mag pvg-etm. nbn-jca ) = 0 
P(len = long\mag pvg-etm. nbn-jca) = 5.17 

3.3 The Probabilistic Dependency Parsing Model 

A dependency parsing model estimates the probability of a parsing tree t for a 
given sentence S. 



P(t\S )« P(de Pi = h(i)\S) (7) 

;<|S| 

We assume a dependency relation depends only on the two words that is 
linked by the relation and their local context. This makes (7) become (8). 

pm ep . - h(i)\S) ~ P(Hnk(i,h(i)) = Yes len = P(h(i) - i)\wiPiW h (i)Ph(i)) 

tp ‘ ~ ~ 'Zk>i, x& {Yes,No},yeDist k ) = x len = y\ WiPiW Ki) P h(i) ) 

(8) 

= P(link(i,h(i)) = Yes\wi Pi w h(i) P h (i)) 

■ P{len = P(h(i) - i)\link(i, h(ij) = Yes Wi Pi P h (i)) (9) 

3 The probabilities for distance 2 is not shown in the figure, and the probabilities for 
long distance modification is marked as dist > 2 . 

4 m,n, and k are 0, 2, and 3 here, mag, pvg-etm and nbn-jca are POS tags for wi, W 2 , 
and w 3 in Figure 2. 




Unlexicalized Dependency Parser for Variable Word Order Languages 



117 



Since the denominator of (8) is constant, and by a using chain rule, we can 
get (9). The latter term of (9) is a probability of a certain modification length. 
Since we assume that the modification length only depends on a modifier and its 
context, and since we exclude all lexical information from the model, the whole 
parsing model becomes as : 

P(depi = h(i)\S) « P(link(i, h(i)) = Yes |w, w h $h(i)) 

■ P(len = P(h(i) — i)\wi <&i) 

« P(link(i, h(i)) = Y es\U A t h(i) $h(i)) • P{len = P(h{i) - i)\U <A) 



As you see, it becomes a product of the probability of word dependency 
between modifier and modifyee, and the probability of length of modification 
relation for the modifier based on the local contextual pattern of it. 

4 Related Works 

There has been little work done on unlexicalizing the parsing model. Instead, 
many studies tried to combine various features including lexicalized information. 
The distance measure is one of widely used feature in dependency parsing. As 
shown earlier, [6] proposed a statistical parsing model for English, based on 
bigram lexical dependencies and distance between the two depending words. [8, 
9,10] proposed similar models for parsing Korean and Japanese. 

However, using the distance features in the conditional part of the probability 
equation 5 as [6] assumes that the dependency relation of a certain length is 
different from dependency relations with different lengths. This assumption may 
cause sparse data problem in estimating word dependencies for the languages 
allowing variable word order. The sparseness would be more serious for the model 
that uses lexical dependencies, such as [9,10]. 

There were another approaches that used modification distance as we do, but 
in a different way. [11,12] utilized handcrafted HPSG for dependency analysis 
of Japanese. HPSG is used to find three alternative modifyee candidates: the 
nearest, the second nearest, and the farthest candidates from a certain modifier. 
Then, the probabilistic models choose an appropriate modifyee among three 
candidates. These models seem to work well for Japanese, however, it is doubtful 
that the parsing models can be applied to other languages well. The parsing 
models are restricted to consider only three head candidates at most, based on 
the statistics from Japanese corpora. So they may fit for Japanese parsing but 
would cause problems for parsing other languages. And these approaches require 
handcrafted grammars which usually demands excessive manual labors. These 
features can be obstacles when someone uses these models to develop a new 
parser for other languages. 

In contrast, our model splits probability of a dependency relation into the 
word dependency probability and the modifying distance probability to allevi- 
ate sparse data problem. And proposed model does not depend on language 

See Equation (2) 



5 
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specific features, and does not require any language-specific manual rules - such 
as heuristic constraints or HPSG - either. So it can be adapted to other lan- 
guages easily. Of course, our model does not ignore any grammatically correct 
modifyee candidates at all, while [11] and [12] ignore less likely grammatical 
modifyee candidates. 



5 Experimental Results 

We implemented a probabilistic parser that uses the proposed parsing model 
and performed some experiments to evaluate our method empirically. We used 
a backward beam search algorithm, which was originally designed to analyze 
Japanese with dependency probabilities [15]. 

The parser was trained on 27,694 sentences and tested on lreldout 3,386 
sentences of dependency tagged sections of Kaist Language Resources [14]. 
All sentences are POS tagged, and this information was used as an input for the 
parser. 



5.1 Deciding Length of Modification Relation with the Modification 
Distance Probability 

First of all, we evaluate our assumption, that is length of a modification relation 
can he determined by a modifier and its local contextual pattern. To do this, 
we made a classifier using the modification distance probability that models 
our assumption statistically. The classifier decide the length of a modification 
relation for the given modifier t, and its context 

modification distance = argmaXd^DistP(len = d \ U d>i) (10) 

= argmax deDist P(len = d |L_ n • • • tj • • • t i+m ) 

We experiment with changing n and m from 0 to 2, while k changes from 1 
to 3. We used F\ measure for evaluating the classifier. The experimental result is 
on Table 2. It tells that considering wider context does not always induce more 
accurate classification. The best result is acquired when m and n are 0 and 2. This 
means the left context hardly affect the performance of deciding modification 
distances 6 . Right context size bigger than 3 does not help the classification too. 

Meanwhile, the performance of the classifier increases as the value of k de- 
creases. It is because a smaller k decreases the number of distance class, which 
is k, and classification becomes easier for smaller and more generic class. We 
selected the values for m, and n as 0 and 2 through this experiment, but could 
not decide the value for k. Although the performance of classifier with bigger k 
is worse, it might be more helpful for the parser to have probabilities for more 
subdivided distances. 



[13] reported that similar characteristic is observed for Japanese too. Based on his 
experiment with humans, it is true more than 90% of the time for Japanese. 
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Table 2. Experimental result (in Fi-score) for the modification distance classifier, with 
various m (left context size) , n (right context size) , and k (class size) values 



context size Dist 



m 


n 


{1, long} (k= 2) 


{1,2, long} (fc=3) 


{1,2,3, long} (k= 4) 


i 


0 


0.916 


0.750 


0.722 


2 


0 


0.787 


0.747 


0.721 


0 


1 


0.916 


0.855 


0.817 


1 


1 


0.898 


0.835 


0.799 


2 


1 


0.847 


0.793 


0.761 


0 


2 


0.926 


0.879 


0.838 


1 


2 


0.894 


0.851 


0.814 


2 


2 


0.816 


0.775 


0.748 


0 


3 


0.872 


0.831 


0.800 


1 


3 


0.831 


0.793 


0.770 


2 


3 


0.794 


0.755 


0.731 



Figure 3 is an example that shows the effect of different k makes. The upper 
arcs show modification distance probability when k is 2. The lower arcs show the 
probability when k is 3. When k is 2, the only information we can get from the 
modification distance probability is that the modifier norae-reul (song-OBj) does 
not modify the next word jal (well) . However, this independency can be known 
by simple dependency rule probability because an object noun never modifies 
an adverb. So the modification distance probability is not helpful when k is 2. 
However, when the value of k is 3, the modification distance probability assigns 
higher probability for length 2 modification relation, which cannot be considered 
with the simple dependency probability. So we will not determine the value k 
here, but use all k for the following experiments. 



5.2 Experiment with Richer Context 

We used modification distance probability and added a little more context in- 
formation to the bare word dependency probability to achieve higher parser 
performance. Here, we are going to evaluate the effect of information we have 
added to the vanilla dependency probability. 

To evaluate the performance of the parser, we used arc-based F\ measure and 
sentence-based exact matching rate. The results are shown in Table 3. It shows 
both additional contextual information (OC & MDP) contribute to the parser 
performance. Interesting point here is the increase of parser performance as k gets 
bigger. In the previous experiments, classifier performs worse for bigger k values. 
This change is due to the effect of larger k, which is more helpful for a parser 
to decide appropriate modifyee as we discussed in the previous experiment with 
Figure 3. The table also shows that using the modification distance probability 
(BM-Z\+MDP) is better than using the distance measure as in [6] (BM). This 
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when k = 2 : Dist = {1 ,long} 



P(dist=l |.„) = 0% P(dist=long|...) = 100% 

noraereul jal bureu-neun saram-ege 

( song-OBJ well singing person-DAT ) 




when k = 3 : Dist = {1, 2, long} 

Fig. 3. Comparison of the modification distance probability from the word norae-reul, 
when k = 2 and k = 3. As k gets bigger, the modification distance probability may be 
more helpful. 

Table 3. Effect of considering outer context and modification distance. BM stands 
for the unlexicalized parser with the model (3). A, OC, and MDP stands for distance 
measure used in [6], outer context, and modification distance probability 





Measures 


BM 


BM 

-A 


BM-A 

+OC 


BM-A+MDP 


BM-A+OC+MDP 


k = 2 


k = 3 


k = 4 


k = 2 


k = 3 


k = 4 


Training 

Set 


Arc Prec. 
Exact Match 


0.801 

0.212 


0.748 

0.181 


0.783 

0.242 


0.812 

0.246 


0.824 

0.273 


0.831 

0.291 


0.854 

0.336 


0.862 

0.350 


0.865 

0.354 


Testing 

Set 


Arc Prec. 
Exact Match 


0.801 

0.200 


0.747 

0.178 


0.762 

0.223 


0.807 

0.225 


0.819 

0.246 


0.826 

0.266 


0.842 

0.310 


0.853 

0.321 


0.856 

0.322 



means using the distance measure as in our paper is better than method in 
others. 

5.3 Comparison with Other Parsers 

We compared our parser with some lexicalized parsers. They are parsers from 
[6] and [ll] 7 . The results are shown in Table 4. 

In the training set, the parser from [6] shows almost 100% score 

It is because the parser is highly lexicalized. The parser from [11] using 
triplet/quadruplet model assumes that a modifyee of a word is one among the 
nearest, second nearest, or the last modifyee candidate. Unfortunately, according 
to our investigation, only 91.48% of modifyees are among the three candidates 
in the training data. This restriction causes the poor performance in the training 
data even the model is lexicalized. 

' [11] parser requires handcrafted grammar(HPSG). Instead of HPSG, we used a set 
of dependency rules whose frequency is more than one in the training corpus as the 
grammar. ( e.g. Treebank grammar [16] ) 




Unlexicalized Dependency Parser for Variable Word Order Languages 



121 



Table 4. Result of the comparison with other lexicalized models. Many statistical 
parsing models dealing with distance measure, such as [9,10], resemble the model of [6] 





Measures 


Parser from [6] 


Parser from [11] 


This Paper (k= 4) 


Training Set 


Arc Fi score 


0.996 


0.908 


0.865 




Exact Matching 


0.966 


0.555 


0.354 


Testing Set 


Arc Fi score 


0.837 


0.843 


0.856 




Exact Matching 


0.256 


0.303 


0.322 



In contrast, our model performs better than other lexicalized models in the 
experiment for the testing data. The improvements (+1.9% from [6]’s and +1.3% 
from [11] ’s, absolute) in the arc-level performance are statistically meaningful. 

This result is showing that the lexical dependency information may be useful 
for accurate parsing, but the proper use of other contextual information may 
be more helpful 8 . And it means the method we used to deal with the length of 
modification relation is effective for syntactic disambiguation. 

6 Conclusion 

We investigate the effect of unlexicalization of dependency parser for variable 
word order languages and propose a new parser, which is unlexicalized to keep 
the parser light, simple, and robust from the data sparseness problem. It utilizes 
some POS-level information to keep accuracy high as lexicalized parsers. In 
particular, we suggest using the modification distance probability to reflect the 
preference on a length of a modification relation for a given modifier and its 
contextual pattern. The experimental results show our model outperformed other 
lexicalized models for parsing Korean, which is a free word order language. Since 
it does not use any language specific predefined rules, the proposed parser can 
be easily adapted to other variable word order languages. 

We don’t say lexical information is unworthy. However, ignoring lexical in- 
formation in a parser can give some advatages - simpler parser implementation, 
smaller disk space and shorter processing time - without sacrificing much accu- 
racy, and this advatages may be useful for some cases, i.e. developing a parser 
for the system with limited memory size or processing speed. 

We found out that the lexicalization plays a bigger role in parsing with prob- 
abilistic dependency grammar, but we haven’t deeply investigated the cause of 
it yet. We will continue to investigate it. And there are some works that does 
not assume independency between dependency relations, such as [17]. We are 

8 In addition, our unlexicalized parser requires much smaller size of frequency data for 
estimating the probabilities. While the lexicalized parsers require 643M ( [6] ’s) and 
540M, ([ll]’s) bytes for storing the data, our parser uses only 18M bytes of data. We 
haven’t trie to optimize the data structure. But taking that into account, the huge 
difference of the required resource size gives some ideas why unlexicalized parser is 
preferable. 
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going to reconstruct our parsing model without the independency assumption 
in the future. 
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Appendix 

Brief Introduction to Korean Syntax 

Two prominent characteristics of Korean are agglutinative morphology, and 
rather free word order with explicit case marking [7]. 
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eoje Eugene-i show-reul bo-at-da. 
(yesterday Eugene-SBJ show-OBJ watched) 



eoje show-reul Eugene-i bo-at-da. 
(yesterday a show-OBJ Eugene-SBJ watched) 



Fig. 4. Dependency trees for Korean sentences which have identical meaning, Eugene 
watched a show yesterday. 



Korean is an agglutinative language, in which a word 9 is in a composition of 
more than one morpheme, in general. There are two types of morpheme: a con- 
tent morpheme and a functional morpheme. A content morpheme contains the 
meaning of the word, while a functional morpheme plays a role as a grammatical 
information marker, which indicates a grammatical role, tense, modality, voice, 
etc. of the word. 

The order of words is relatively weak in Korean compared to the fixed-order 
languages such as English. The grammatical information conveyed from a func- 
tional morpheme makes a word order be free. The following example is a Korean 
sentence consists of 4 words. Let’s see a simple example 10 . 

eoje Eugene-i show-reul bo-at-da. 

yesterday Eugene-SBJ a slrow-OBJ watched 
Eugene watched a show yesterday. 

The second word in the sentence is Eugene-i. It consists of a content mor- 
pheme Eugene and a functional morpheme i which is a subject case marking 
postposition. The sentence can be rewritten as : 

eoje show-reul Eugene-i bo-at-da. 

yesterday a show-ACC Eugene-NUM watch-PAST-END 
Eugene watched a show yesterday. 

Though the subject and the object exchange their position, the two sentences 
have identical meaning. Because of this property of Korean, dependency gram- 
mar is widely used for analyzing syntactic structure of the Korean language. 

The grammatical relation of a dependency relation can be specified by the 
functional morpheme of the modifier for the most case, selecting modifyee word 
for the modifier is the main concern for dependency parsing with Korean. Fig- 
ure 4 shows dependency structure trees for the Korean sentences shown above. 



9 The exact term for the word is eojeol. However we use the term word for easier 
understanding. 

0 SBJ and OBJ stand for subjective and objective case. 
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Abstract. This article presents a robust syntactic analyser for Basque and the 
different modules it contains. Each module is structured in different analysis 
layers for which each layer takes the information provided by the previous layer 
as its input; thus creating a gradually deeper syntactic analysis in cascade. This 
analysis is carried out using the Constraint Grammar (CG) formalism. 
Moreover, the article describes the standardisation process of the parsing 
formats using XML. 



1 Introduction 

This article describes the steps we have followed for the construction of a robust 
cascaded syntactic analyser for Basque. Robust parsing is understood as “the ability of 
a language analyser to provide useful analyses for real-world input texts. By useful 
analyses, we mean analyses that are (at least partially) correct and usable in some 
automatic task or application’ (Ait-Mokhtar et al., 2002). The creation of the robust 
analyser is performed based on a shallow parser. In this approach, incomplete 
syntactic structures are produced and thus the process goes beyond shallow parsing to 
a deeper language analysis in an incremental fashion. This allows us to tackle 
unrestricted text parsing through descriptions that are organized in ordered modules, 
depending on the depth level of the analysis (see Fig. 1). 

In agglutinative languages like Basque, it is difficult to separate morphology from 
syntax. That is why we consider morphosyntactic parsing for the first phase of the 
shallow syntactic analyser, which, in turn, will provide the basis for a deeper syntactic 
analysis. 

In section 2 we briefly describe the main features of Basque. The steps followed in 
the process of creation of the cascaded parser are presented in section 3. Section 4 
explains how the information is encoded in XML following the Text Encoding 
Initiative (TEI) guidelines. Finally, some conclusions and objectives for future work 
are presented. 

A. Gelbukh (Ed.): CICLing 2004, LNCS 2945, pp. 124-134, 2004. 

© Springer-Verlag Berlin Heidelberg 2004 
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2 Main Features of Basque 

Basque is not an Indo-European language and differs considerably in grammar from 
the languages spoken in surrounding regions. It is an inflectional language in which 
grammatical relations between components within a clause are represented by 
suffixes. This is a distinguishing feature since the morphological information that 
words contain is richer than in surrounding languages. Given that Basque is a head 
final language at the syntactic level, the morphological information of the phrase 
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(number, case, etc.), which is considered to be the head, is in the attached suffix. That 
is why morphosyntactic analysis is essential. In fact, Basque is known as a free-order 
language. 



3 Syntactic Processing of Basque: The Steps Followed 

We face the creation of a robust syntactic analyser by implementing it in sequential 
rule layers. In most of the cases, these layers are realized in grammars defined by the 
Constraint Grammar formalism (Karlsson et al., 1995; Tapanainen & Voutilainen, 
1994). Each analysis layer uses the output of the previous layer as its input and 
enriches it with further information. Rule layers are grouped into modules depending 
on the level of depth of their analysis. Modularity helps to maintain linguistic data and 
makes the system easily customisable or reusable. 

Figure 1 shows the architecture of the system. The shallow parsing of the text 
begins with the morphosyntactic analysis. The information obtained is then separated 
into noun and verb chains. Finally, the deep analysis phase establishes the 
dependency-based grammatical relations between the components within the clause. 

The results obtained in each parsing level of the sentence Noizean behin itsaso 
aldetik Donostiako Ondarreta hondartzara enbata iristen da ‘Once in a while, a 
storm arrives from high seas to the Ondarreta beach in Donostia’ will help in 
providing a better understanding of the mentioned parsing process. 



3.1 Applied Formalism 

The parsing system is based on finite state grammars. The Constraint Grammar (CG) 
formalism has been chosen in most cases because, on the one hand, it is suitable for 
treating unrestricted texts and, on the other hand, it provides a useful methodology 
and the tools to tackle morphosyntax as well as free order phrase components in a 
direct way. The analyser used is CG-2 (www.conexor.com). 

A series of grammars are implemented within the module of the shallow parsing 
which aim: 

1. To be useful for the disambiguation of grammatical categories, removing incorrect 
tags based on the context; 

2. To assign and disambiguate partial syntactic functions; 

3. To assign the corresponding tags to delimit verb and noun chains. 

Besides, dependency-based parsing is made explicit in the deep parsing module by 
means of grammars similar to those used in the shallow parsing module. 

Even though CG originally uses mapping rules to assign the syntactic functions of 
grammatical categories defined by the context, in the above-mentioned modules these 
rules assign the corresponding syntactic tags to each analysis level. An example of a 
rule defined to detect the beginning of noun chains is shown below: 



MAP (%INIT_NCH) TARGET (NOUN) IF (0 (GEN-GEL) + ( @NC)) 

(-1 PUNCT) 

(1 NOUN OR ADJ OR DET ); 





